vis4d.op.layer.positional_encoding
Positional encoding for transformer.
Modified from mmdetection (https://github.com/open-mmlab/mmdetection).
Classes
|
Position embedding with learnable embedding weights. |
|
Position encoding with sine and cosine functions. |
- class SinePositionalEncoding(num_feats, temperature=10000, normalize=False, scale=6.283185307179586, eps=1e-06, offset=0.0)[source]
Position encoding with sine and cosine functions.
See End-to-End Object Detection with Transformers for details.
- __init__(num_feats, temperature=10000, normalize=False, scale=6.283185307179586, eps=1e-06, offset=0.0)[source]
Initialization for SinePositionalEncoding.
- Parameters:
num_feats (int) – The feature dimension for each position along x-axis or y-axis. Note the final returned dimension for each position is 2 times of this value.
temperature (int, optional) – The temperature used for scaling the position embedding. Defaults to 10000.
normalize (bool, optional) – Whether to normalize the position embedding. Defaults to False.
scale (float, optional) – A scale factor that scales the position embedding. The scale will be used only when normalize is True. Defaults to 2*pi.
eps (float, optional) – A value added to the denominator for numerical stability. Defaults to 1e-6.
offset (float, optional) – offset add to embed when do the normalization. Defaults to 0.
- forward(mask, inputs=None)[source]
Forward function for SinePositionalEncoding.
- Parameters:
mask (Tensor | None) – ByteTensor mask. Non-zero values representing ignored positions, while zero values means valid positions for this image. Shape [bs, h, w]. If None, it means single image or batch image with no padding.
inputs (Tensor | None) – The input tensor. It mask is None, this input tensor is required to get the shape of the input image.
- Returns:
- Returned position embedding with shape
[bs, num_feats*2, h, w].
- Return type:
pos (Tensor)
- class LearnedPositionalEncoding(num_feats, row_num_embed=50, col_num_embed=50)[source]
Position embedding with learnable embedding weights.
- __init__(num_feats, row_num_embed=50, col_num_embed=50)[source]
Initialization for LearnedPositionalEncoding.
- Parameters:
num_feats (int) – The feature dimension for each position along x-axis or y-axis. The final returned dimension for each position is 2 times of this value.
row_num_embed (int, optional) – The dictionary size of row embeddings. Defaults to 50.
col_num_embed (int, optional) – The dictionary size of col embeddings. Defaults to 50.
- forward(mask)[source]
Forward function for LearnedPositionalEncoding.
- Parameters:
mask (Tensor) – ByteTensor mask. Non-zero values representing ignored positions, while zero values means valid positions for this image. Shape [bs, h, w].
- Returns:
- Returned position embedding with shape
[bs, num_feats*2, h, w].
- Return type:
pos (Tensor)