vis4d.common.distributed
This module contains utilities for multiprocess parallelism.
Functions
|
Share arbitrary picklable data via file system caching. |
|
Run pl_module.all_gather on arbitrary picklable data. |
|
Apply all reduce function for python dict object. |
|
Broadcast an object from a source to all processes. |
|
Create and distribute a temporary directory across all processes. |
Check if torch.distributed is available. |
|
Get the local rank of the current process in torch.distributed. |
|
|
Get the global rank of the current process in torch.distributed. |
Get the world size (number of processes) of torch.distributed. |
|
|
Checks recursively if a module is wrapped. |
|
Serialize picklable python object to tensor. |
|
Pad tensor to largest size among the tensors in each process. |
|
Allows the decorated function to be called only on global rank 0. |
|
Obtain the mean of tensor on different GPUs. |
|
Serialize arbitrary picklable data to a Tensor. |
Sync (barrier) among all processes when using distributed training. |
|
|
Deserialize tensor to picklable python object. |
- get_world_size()[source]
Get the world size (number of processes) of torch.distributed.
- Returns:
The world size.
- Return type:
int
- get_rank()[source]
Get the global rank of the current process in torch.distributed.
- Returns:
The global rank.
- Return type:
int
- get_local_rank()[source]
Get the local rank of the current process in torch.distributed.
- Returns:
The local rank.
- Return type:
int
- distributed_available()[source]
Check if torch.distributed is available.
- Returns:
Whether torch.distributed is available.
- Return type:
bool
- synchronize()[source]
Sync (barrier) among all processes when using distributed training.
- Return type:
None
- serialize_to_tensor(data)[source]
Serialize arbitrary picklable data to a Tensor.
- Parameters:
data (Any) – The data to serialize.
- Returns:
The serialized data as a Tensor.
- Return type:
Tensor
- Raises:
AssertionError – If the backend of torch.distributed is not gloo or nccl.
- rank_zero_only(func)[source]
Allows the decorated function to be called only on global rank 0.
- Parameters:
func (GenericFunc) – The function to decorate.
- Returns:
The decorated function.
- Return type:
GenericFunc
- pad_to_largest_tensor(tensor)[source]
Pad tensor to largest size among the tensors in each process.
- Parameters:
tensor (
Tensor) – tensor to be padded.- Returns:
size of the tensor, on each rank Tensor: padded tensor that has the max size
- Return type:
list[int]
- all_gather_object_gpu(data, rank_zero_return_only=True)[source]
Run pl_module.all_gather on arbitrary picklable data.
- Parameters:
data (
Any) – any picklable objectrank_zero_return_only (
bool) – if results should only be returned on rank 0
- Returns:
list of data gathered from each process
- Return type:
list[Any]
- create_tmpdir(rank, tmpdir=None, use_system_tmp=True)[source]
Create and distribute a temporary directory across all processes.
- Return type:
str
- all_gather_object_cpu(data, tmpdir=None, rank_zero_return_only=True, use_system_tmp=False)[source]
Share arbitrary picklable data via file system caching.
- Parameters:
data (
Any) – any picklable object.tmpdir (
Optional[str]) – Save path for temporary files. If None, safely create tmpdir.rank_zero_return_only (
bool) – if results should only be returned on rank 0.use_system_tmp (
bool) – if use system tmpdir or not.
- Returns:
list of data gathered from each process.
- Return type:
list[Any]
- obj2tensor(pyobj, device=device(type='cuda'))[source]
Serialize picklable python object to tensor.
- Parameters:
pyobj (Any) – Any picklable python object.
device (torch.device) – Device to put on. Defaults to “cuda”.
- Return type:
Tensor
- tensor2obj(tensor)[source]
Deserialize tensor to picklable python object.
- Parameters:
tensor (Tensor) – Tensor to be deserialized.
- Return type:
Any
- all_reduce_dict(py_dict, reduce_op='sum', to_float=True)[source]
Apply all reduce function for python dict object.
The code is modified from https://github.com/Megvii-BaseDetection/YOLOX/blob/main/yolox/utils/allreduce_norm.py.
NOTE: make sure that py_dict in different ranks has the same keys and the values should be in the same shape. Currently only supports NCCL backend.
- Parameters:
py_dict (DictStrAny) – Dict to be applied all reduce op.
reduce_op (str) – Operator, could be ‘sum’ or ‘mean’. Default: ‘sum’.
to_float (bool) – Whether to convert all values of dict to float. Default: True.
- Returns:
reduced python dict object.
- Return type:
DictStrAny