vis4d.op.layer.ms_deform_attn
Multi-Scale Deformable Attention Module.
Modified from Deformable DETR (https://github.com/fundamentalvision/Deformable-DETR/blob/main/models/ops/modules/ms_deform_attn.py) # pylint: disable=line-too-long
Functions
|
Check if a number is a power of 2. |
|
CPU version of multi-scale deformable attention. |
Classes
|
Multi-Scale Deformable Attention Module. |
|
Multi-Scale Deformable Attention Function module. |
|
A wrapper for |
- class MSDeformAttentionFunction(*args, **kwargs)[source]
Multi-Scale Deformable Attention Function module.
- ms_deformable_attention_cpu(value, value_spatial_shapes, sampling_locations, attention_weights)[source]
CPU version of multi-scale deformable attention.
- Parameters:
value (Tensor) – The value has shape (bs, num_keys, mum_heads, embed_dims // num_heads)
value_spatial_shapes (Tensor) – Spatial shape of each feature map, has shape (num_levels, 2), last dimension 2 represent (h, w).
sampling_locations (Tensor) – The location of sampling points, has shape (bs ,num_queries, num_heads, num_levels, num_points, 2), the last dimension 2 represent (x, y).
attention_weights (Tensor) – The weight of sampling points used when calculate the attention, has shape (bs ,num_queries, num_heads, num_levels, num_points),
- Returns:
has shape (bs, num_queries, embed_dims).
- Return type:
Tensor
- class MSDeformAttention(d_model=256, n_levels=4, n_heads=8, n_points=4, im2col_step=64)[source]
Multi-Scale Deformable Attention Module.
This is the original implementation from Deformable DETR.
- __init__(d_model=256, n_levels=4, n_heads=8, n_points=4, im2col_step=64)[source]
Creates an instance of the class.
- Parameters:
d_model (int) – Hidden dimensions.
n_levels (int) – Number of feature levels.
n_heads (int) – Number of attention heads.
n_points (int) – Number of sampling points per attention head per feature level.
im2col_step (int) – The step used in image_to_column. Default: 64.
- forward(query, reference_points, input_flatten, input_spatial_shapes, input_level_start_index, input_padding_mask=None)[source]
Forward function.
- Parameters:
query (Tensor) – (n, length_{query}, C).
reference_points (Tensor) – (n, length_{query}, n_levels, 2), range in [0, 1], top-left (0,0), bottom-right (1, 1), including padding area or (n, length_{query}, n_levels, 4), add additional (w, h) to form reference boxes.
input_flatten (Tensor) – (n, sum_{l=0}^{L-1} H_l cdot W_l, C).
input_spatial_shapes (Tensor) – (n_levels, 2), [(H_0, W_0), (H_1, W_1), …, (H_{L-1}, W_{L-1})]
input_level_start_index (Tensor) – (n_levels, ), [0, H_0*W_0, H_0*W_0+H_1*W_1, H_0*W_0+H_1*W_1+H_2*W_2, …, H_0*W_0+H_1*W_1+…+H_{L-1}*W_{L-1}]
input_padding_mask (Tensor) – (n, sum_{l=0}^{L-1} H_l cdot W_l), True for padding elements, False for non-padding elements.
- Return type:
Tensor
- Retrun
output (Tensor): (n, length_{query}, C).
- class MultiScaleDeformableAttention(embed_dims=256, num_heads=8, num_levels=4, num_points=4, im2col_step=64, dropout=0.0)[source]
A wrapper for
MSDeformAttention.This module implements MSDeformAttention with identity connection, and positional encoding is also passed as input.
- __init__(embed_dims=256, num_heads=8, num_levels=4, num_points=4, im2col_step=64, dropout=0.0)[source]
Init.
- forward(query, reference_points, input_flatten, input_spatial_shapes, input_level_start_index, query_pos=None, identity=None, input_padding_mask=None)[source]
Forward function.
- Parameters:
query (Tensor) – The input query with shape [bs, num_queries, embed_dims].
reference_points (Tensor) – (bs, num_queries, num_levels, 2), range in [0, 1], top-left (0,0), bottom-right (1, 1), including padding area or (bs, num_queries, num_levels, 4), add additional (w, h) to form reference boxes.
input_flatten (Tensor) – (bs, sum_{l=0}^{L-1} H_l cdot W_l, C).
input_spatial_shapes (Tensor) – (num_levels, 2), [(H_0, W_0), (H_1, W_1), …, (H_{L-1}, W_{L-1})].
input_level_start_index (Tensor) – (num_levels, ), [0, H_0*W_0, H_0*W_0+H_1*W_1, H_0*W_0+H_1*W_1+H_2*W_2, …, H_0*W_0+H_1*W_1+…+H_{L-1}*W_{L-1}].
query_pos (Tensor | None) – The positional encoding for query, with the same shape as query. If not None, it will be added to query before forward function. Defaults to None.
identity (Tensor | None) – With the same shape as query, it will be used for the identity link. If None, query will be used. Defaults to None.
input_padding_mask (Tensor) – (bs, sum_{l=0}^{L-1} H_l cdot W_l), True for padding elements, False for non-padding elements.
- Return type:
Tensor
- Returns
output (Tensor): (bs, num_queries, C).