Shortcuts

Engines

AMP

AMPEngine

class catalyst.engines.amp.AMPEngine(device: str = 'cuda')[source]

Bases: catalyst.engines.torch.DeviceEngine

Pytorch.AMP single training device engine.

Parameters

device – used device, default is “cuda”.

DataParallelAMPEngine

class catalyst.engines.amp.DataParallelAMPEngine[source]

Bases: catalyst.engines.amp.AMPEngine

AMP multi-gpu training device engine.

DistributedDataParallelAMPEngine

class catalyst.engines.amp.DistributedDataParallelAMPEngine(address: str = 'localhost', port: str = '12345', backend: str = 'nccl', world_size: int = None)[source]

Bases: catalyst.engines.torch.DistributedDataParallelEngine

Distributed AMP multi-gpu training device engine.

Parameters
  • address – process address to use (required for PyTorch backend), default is “localhost”.

  • port – process port to listen (required for PyTorch backend), default is “12345”.

  • backend – multiprocessing backend to use, default is “nccl”.

  • world_size – number of processes.

Apex

APEXEngine

class catalyst.engines.apex.APEXEngine(device: str = 'cuda', opt_level: str = 'O1', keep_batchnorm_fp32: bool = None, loss_scale: Union[float, str] = None)[source]

Bases: catalyst.engines.torch.DeviceEngine

Apex single training device engine.

Parameters
  • device – use device, default is “cuda”.

  • opt_level

    optimization level, should be one of “O0”, “O1”, “O2”, “O3” or “O4”.

    • ”O0” - no-op training

    • ”O1” - mixed precision (FP16) training (default)

    • ”O2” - “almost” mixed precision training

    • ”O3” - another implementation of mixed precision training

    Details about levels can be found here: https://nvidia.github.io/apex/amp.html#opt-levels

  • keep_batchnorm_fp32 – To enhance precision and enable cudnn batchnorm (which improves performance), it’s often beneficial to keep batchnorm weights in FP32 even if the rest of the model is FP16.

  • loss_scale – If loss_scale is a float value, use this value as the static (fixed) loss scale. If loss_scale is the string “dynamic”, adaptively adjust the loss scale over time. Dynamic loss scale adjustments are performed by Amp automatically.

DataParallelApexEngine

class catalyst.engines.apex.DataParallelApexEngine(opt_level: str = 'O1')[source]

Bases: catalyst.engines.apex.APEXEngine

Apex multi-gpu training device engine.

DistributedDataParallelApexEngine

class catalyst.engines.apex.DistributedDataParallelApexEngine(address: str = 'localhost', port: str = '12345', backend: str = 'nccl', world_size: int = None, opt_level: str = 'O1', keep_batchnorm_fp32: bool = None, loss_scale: Union[float, str] = None, delay_all_reduce: bool = True)[source]

Bases: catalyst.engines.torch.DistributedDataParallelEngine

Distributed Apex MultiGPU training device engine.

Parameters
  • address – process address to use (required for PyTorch backend), default is “localhost”.

  • port – process port to listen (required for PyTorch backend), default is “12345”.

  • backend – multiprocessing backend to use, default is “nccl”.

  • world_size – number of processes.

  • opt_level

    optimization level, should be one of “O0”, “O1”, “O2”, “O3” or “O4”.

    • ”O0” - no-op training

    • ”O1” - mixed precision (FP16) training (default)

    • ”O2” - “almost” mixed precision training

    • ”O3” - another implementation of mixed precision training

    Details about levels can be found here: https://nvidia.github.io/apex/amp.html#opt-levels

  • keep_batchnorm_fp32 – To enhance precision and enable cudnn batchnorm (which improves performance), it’s often beneficial to keep batchnorm weights in FP32 even if the rest of the model is FP16.

  • loss_scale – If loss_scale is a float value, use this value as the static (fixed) loss scale. If loss_scale is the string “dynamic”, adaptively adjust the loss scale over time. Dynamic loss scale adjustments are performed by Amp automatically.

  • delay_all_reduce – boolean flag for delayed all reduce

Torch

DeviceEngine

class catalyst.engines.torch.DeviceEngine(device: str = None)[source]

Bases: catalyst.core.engine.IEngine

Single training device engine.

Parameters

device (str, optional) – use device, default is “cpu”.

DataParallelEngine

class catalyst.engines.torch.DataParallelEngine[source]

Bases: catalyst.engines.torch.DeviceEngine

MultiGPU training device engine.

DistributedDataParallelEngine

class catalyst.engines.torch.DistributedDataParallelEngine(address: str = 'localhost', port: str = '12345', backend: str = 'nccl', world_size: int = None)[source]

Bases: catalyst.engines.torch.DeviceEngine

Distributed MultiGPU training device engine.

Parameters
  • address – process address to use (required for PyTorch backend), default is “localhost”.

  • port – process port to listen (required for PyTorch backend), default is “12345”.

  • backend – multiprocessing backend to use, default is “nccl”.

  • world_size – number of processes.