Engines¶

CPUEngine
GPUEngine
DataParallelEngine
DistributedDataParallelEngine
DistributedXLAEngine

You could check engines overview under examples/engines section.

CPUEngine ¶

class catalyst.engines.torch.CPUEngine(*args, **kwargs)[source]¶

Bases: catalyst.core.engine.Engine

CPU-based engine.

GPUEngine ¶

class catalyst.engines.torch.GPUEngine(*args, **kwargs)[source]¶

Bases: catalyst.core.engine.Engine

Single-GPU-based engine.

DataParallelEngine ¶

class catalyst.engines.torch.DataParallelEngine(*args, **kwargs)[source]¶

Bases: catalyst.core.engine.Engine

Multi-GPU-based engine.

DistributedDataParallelEngine ¶

class catalyst.engines.torch.DistributedDataParallelEngine(*args, address: str = '127.0.0.1', port: Union[str, int] = 2112, world_size: Optional[int] = None, workers_dist_rank: int = 0, num_node_workers: Optional[int] = None, process_group_kwargs: Dict[str, Any] = None, **kwargs)[source]¶

Bases: catalyst.core.engine.Engine

Distributed multi-GPU-based engine.

Parameters

*args – args for Accelerator.__init__
address – master node (rank 0)’s address, should be either the IP address or the hostname of node 0, for single node multi-proc training, can simply be 127.0.0.1
port – master node (rank 0)’s free port that needs to be used for communication during distributed training
world_size – the number of processes to use for distributed training. Should be less or equal to the number of GPUs
workers_dist_rank – the rank of the first process to run on the node. It should be a number between number of initialized processes and world_size - 1, the other processes on the node wiil have ranks # of initialized processes + 1, # of initialized processes + 2, …, # of initialized processes + num_node_workers - 1
num_node_workers – the number of processes to launch on the node. For GPU training, this is recommended to be set to the number of GPUs on the current node so that each process can be bound to a single GPU
process_group_kwargs – parameters for torch.distributed.init_process_group. More info here: https://pytorch.org/docs/stable/distributed.html#torch.distributed.init_process_group # noqa: E501, W505
**kwargs – kwargs for Accelerator.__init__

DistributedXLAEngine ¶

class catalyst.engines.torch.DistributedXLAEngine(*args, **kwargs)[source]¶

Bases: catalyst.core.engine.Engine

Distributed XLA-based engine.

Engines¶

CPUEngine¶

GPUEngine¶

DataParallelEngine¶

DistributedDataParallelEngine¶

DistributedXLAEngine¶

CPUEngine ¶

GPUEngine ¶

DataParallelEngine ¶

DistributedDataParallelEngine ¶

DistributedXLAEngine ¶