Shortcuts

Core

Runner

class catalyst.core.runner.IRunner(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, engine: catalyst.core.engine.IEngine = None)[source]

Bases: catalyst.core.callback.ICallback, catalyst.core.logger.ILogger, abc.ABC

An abstraction that contains all the logic of how to run the experiment, stages, epochs, loaders and batches.

Parameters
  • model – Torch model object

  • engine – IEngine instance

Note

To learn more about Catalyst Core concepts, please check out

get_callbacks(stage: str) → OrderedDict[str, ICallback][source]

Returns callbacks for a given stage.

Parameters

stage – stage name of interest like “pretrain” / “train” / “finetune” / etc

Returns

Ordered dictionary # noqa: DAR202 with callbacks for current stage.

Return type

OrderedDict[str, Callback]

get_criterion(stage: str) → Optional[torch.nn.modules.module.Module][source]

Returns the criterion for a given stage and epoch.

Example:

# for typical classification task
>>> runner.get_criterion(stage="train")
nn.CrossEntropyLoss()
Parameters

stage – stage name of interest like “pretrain” / “train” / “finetune” / etc

Returns: # noqa: DAR201, DAR202

Criterion: criterion for a given stage.

get_datasets(stage: str) → OrderedDict[str, Dataset][source]

Returns the datasets for a given stage and epoch. # noqa: DAR401

Note

For Deep Learning cases you have the same dataset during whole stage.

For Reinforcement Learning it’s common to change the dataset (experiment) every training epoch.

Parameters

stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc

Returns: # noqa: DAR202
OrderedDict[str, Dataset]: Ordered dictionary

with datasets for current stage and epoch.

Note

We need ordered dictionary to guarantee the correct dataflow and order of our training datasets. For example, to run train loader before validation one :)

Example:

>>> runner.get_datasets(stage="training")
OrderedDict({
    "train": CsvDataset(in_csv=in_csv_train, ...),
    "valid": CsvDataset(in_csv=in_csv_valid, ...),
})
abstract get_engine() → catalyst.core.engine.IEngine[source]

Returns the engine for the run.

abstract get_loaders(stage: str) → OrderedDict[str, DataLoader][source]

Returns the loaders for a given stage. # noqa: DAR401

Note

Wrapper for catalyst.core.experiment.IExperiment.get_datasets. For most of your experiments you need to rewrite get_datasets method only.

Parameters

stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc

Returns: # noqa: DAR201, DAR202
OrderedDict[str, DataLoader]: Ordered dictionary

with loaders for current stage and epoch.

get_loggers() → Dict[str, catalyst.core.logger.ILogger][source]

Returns the loggers for the run.

abstract get_model(stage: str) → torch.nn.modules.module.Module[source]

Returns the model for a given stage and epoch.

Example:

# suppose we have typical MNIST model, like
# nn.Sequential(nn.Linear(28*28, 128), nn.Linear(128, 10))
>>> runner.get_model(stage="train")
Sequential(
 : Linear(in_features=784, out_features=128, bias=True)
 : Linear(in_features=128, out_features=10, bias=True)
)
Parameters

stage – stage name of interest like “pretrain” / “train” / “finetune” / etc

Returns: # noqa: DAR201, DAR202

Model: model for a given stage.

get_optimizer(stage: str, model: torch.nn.modules.module.Module) → Optional[torch.optim.optimizer.Optimizer][source]

Returns the optimizer for a given stage and model.

Example:

>>> runner.get_optimizer(model=model, stage="train")
torch.optim.Adam(model.parameters())
Parameters
  • stage – stage name of interest like “pretrain” / “train” / “finetune” / etc

  • model – model to optimize with stage optimizer

Returns: # noqa: DAR201, DAR202

Optimizer: optimizer for a given stage and model.

get_scheduler(stage: str, optimizer: torch.optim.optimizer.Optimizer) → Optional[torch.optim.lr_scheduler._LRScheduler][source]

Returns the scheduler for a given stage and optimizer.

Example::
>>> runner.get_scheduler(stage="training", optimizer=optimizer)
torch.optim.lr_scheduler.StepLR(optimizer)
Parameters
  • stage – stage name of interest like “pretrain” / “train” / “finetune” / etc

  • optimizer – optimizer to schedule with stage scheduler

Returns: # noqa: DAR201, DAR202

Scheduler: scheduler for a given stage and optimizer.

get_stage_len(stage: str) → int[source]

Returns number of epochs for the selected stage.

Parameters

stage – current stage

Returns

number of epochs in stage

Example:

>>> runner.get_stage_len("pretraining")
3
get_trial() → Optional[catalyst.core.trial.ITrial][source]

Returns the trial for the run.

abstract handle_batch(batch: Mapping[str, Any]) → None[source]

Inner method to handle specified data batch. Used to make a train/valid/infer stage during Experiment run.

Parameters

batch (Mapping[str, Any]) – dictionary with data batches from DataLoader.

property hparams

Returns hyper-parameters for current run.

Example::
>>> runner.hparams
OrderedDict([('optimizer', 'Adam'),
 ('lr', 0.02),
 ('betas', (0.9, 0.999)),
 ('eps', 1e-08),
 ('weight_decay', 0),
 ('amsgrad', False),
 ('train_batch_size', 32)])
Returns

dictionary with hyperparameters

log_hparams(*args, **kwargs) → None[source]

Logs hyperparameters to available loggers.

log_image(*args, **kwargs) → None[source]

Logs image to available loggers.

log_metrics(*args, **kwargs) → None[source]

Logs batch, loader and epoch metrics to available loggers.

run() → catalyst.core.runner.IRunner[source]

Runs the experiment.

Returns

self, IRunner instance after the experiment

property seed

Experiment’s seed for reproducibility.

abstract property stages

Run’s stage names.

Example:

>>> runner.stages
["pretraining", "finetuning"]
class catalyst.core.runner.RunnerException[source]

Bases: Exception

Exception class for all runner errors.

Engine

class catalyst.core.engine.IEngine[source]

Bases: abc.ABC

An abstraction that syncs experiment run with different hardware-specific configurations.

  • cpu

  • single-gpu

  • multi-gpu

  • amp (nvidia, torch)

  • ddp (torch, etc)

autocast(*args, **kwargs)[source]

AMP scaling context. Default autocast context does not scale anything.

Parameters
  • *args – some args

  • **kwargs – some kwargs

Returns

context

abstract backward_loss(loss, model, optimizer) → None[source]

Abstraction over loss.backward() step. Should be overloaded in cases when required loss scaling. Examples - APEX and AMP.

Parameters
  • loss – tensor with loss value.

  • model – model module.

  • optimizer – model optimizer.

abstract deinit_components()[source]

Deinits the runs components. In distributed mode should destroy process group.

abstract init_components(model_fn=None, criterion_fn=None, optimizer_fn=None, scheduler_fn=None)[source]

Inits the runs components.

property is_ddp

Boolean flag for distributed run.

property is_master_process

Checks if a process is master process. Should be implemented only for distributed training (ddp). For non distributed training should always return True.

Returns

True if current process is a master process in other cases return False.

property is_worker_process

Checks if a process is worker process. Should be implemented only for distributed training (ddp). For non distributed training should always return False.

Returns

True if current process is a worker process in other cases return False.

abstract load_checkpoint(path: str) → Dict[source]

Load checkpoint from path.

Parameters

path – checkpoint file to load

abstract optimizer_step(loss, model, optimizer) → None[source]

Abstraction over optimizer.step() step. Should be overloaded in cases when required gradient scaling. Example - AMP.

Parameters
  • loss – tensor with loss value.

  • model – model module.

  • optimizer – model optimizer.

abstract pack_checkpoint(model: torch.nn.modules.module.Module = None, criterion: torch.nn.modules.module.Module = None, optimizer: torch.optim.optimizer.Optimizer = None, scheduler: torch.optim.lr_scheduler._LRScheduler = None, **kwargs) → Dict[source]

Packs model, criterion, optimizer, scheduler and some extra info **kwargs to torch-based checkpoint.

Parameters
  • model – torch model

  • criterion – torch criterion

  • optimizer – torch optimizer

  • scheduler – torch scheduler

  • **kwargs – some extra info to pack

abstract property rank

Process rank for distributed training.

abstract save_checkpoint(checkpoint: Dict, path: str) → None[source]

Saves checkpoint to a file.

Parameters
  • checkpoint – data to save.

  • path – filepath where checkpoint should be stored.

abstract sync_device(tensor_or_module: Any) → Any[source]

Moves tensor_or_module to Engine’s device.

Parameters

tensor_or_module – tensor to mode

abstract sync_tensor(tensor: Any, mode: str) → Any[source]

Syncs tensor over world_size in distributed mode.

abstract unpack_checkpoint(checkpoint: Dict, model: torch.nn.modules.module.Module = None, criterion: torch.nn.modules.module.Module = None, optimizer: torch.optim.optimizer.Optimizer = None, scheduler: torch.optim.lr_scheduler._LRScheduler = None, **kwargs) → None[source]

Load checkpoint from file and unpack the content to a model (if not None), criterion (if not None), optimizer (if not None), scheduler (if not None).

Parameters
  • checkpoint – checkpoint to load

  • model – model where should be updated state

  • criterion – criterion where should be updated state

  • optimizer – optimizer where should be updated state

  • scheduler – scheduler where should be updated state

  • kwargs – extra arguments

abstract property world_size

Process world size for distributed training.

abstract zero_grad(loss, model, optimizer) → None[source]

Abstraction over model.zero_grad() step. Should be overloaded in cases when required to set arguments for model.zero_grad() like set_to_none=True or you need to use custom scheme which replaces/improves .zero_grad() method.

Parameters
  • loss – tensor with loss value.

  • model – model module.

  • optimizer – model optimizer.

Callback

class catalyst.core.callback.CallbackNode[source]

Bases: enum.IntFlag

Callback node usage flag during distributed training.

  • All (0) - use on all nodes, botch master and worker.

  • Master (1) - use only on master node.

  • Worker (2) - use only in worker nodes.

All = 0
Master = 1
Worker = 2
all = 0
master = 1
worker = 2
class catalyst.core.callback.CallbackOrder[source]

Bases: enum.IntFlag

Callback usage order during training.

Catalyst executes Callbacks with low CallbackOrder before Callbacks with high CallbackOrder.

Predefined orders:

  • Internal (0) - some Catalyst Extras, like PhaseCallbacks (used in GANs).

  • Metric (20) - Callbacks with metrics and losses computation.

  • MetricAggregation (40) - metrics aggregation callbacks, like sum different losses into one.

  • Optimizer (60) - optimizer step, requires computed metrics for optimization.

  • Scheduler (80) - scheduler step, in ReduceLROnPlateau case requires computed validation metrics for optimizer schedule.

  • External (100) - additional callbacks with custom logic, like InferenceCallbacks

Nevertheless, you always can create CustomCallback with any order, for example:

>>> class MyCustomCallback(Callback):
>>>     def __init__(self):
>>>         super().__init__(order=33)
>>>     ...
# MyCustomCallback will be executed after all `Metric`-Callbacks
# but before all `MetricAggregation`-Callbacks.
External = 100
ExternalExtra = 120
Internal = 0
Metric = 20
MetricAggregation = 40
Optimizer = 60
Scheduler = 80
external = 100
external_extra = 120
internal = 0
metric = 20
metric_aggregation = 40
optimizer = 60
scheduler = 80
class catalyst.core.callback.CallbackScope[source]

Bases: enum.IntFlag

Callback scope usage flag during training.

  • Stage (0) - use Callback only during one experiment stage.

  • Experiment (1) - use Callback during whole experiment run.

Experiment = 1
Stage = 0
experiment = 1
stage = 0
class catalyst.core.callback.Callback(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]

Bases: catalyst.core.callback.ICallback

An abstraction that lets you customize your experiment run logic.

Parameters
  • order – flag from CallbackOrder

  • node – flag from CallbackNode

  • scope – flag from CallbackScope

To give users maximum flexibility and extensibility Catalyst supports callback execution anywhere in the training loop:

-- stage start
---- epoch start
------ loader start
-------- batch start
---------- batch handler (Runner logic)
-------- batch end
------ loader end
---- epoch end
-- stage end

exception – if an Exception was raised

Note

To learn more about Catalyst Core concepts, please check out

Abstraction, please check out the implementations:

class catalyst.core.callback.CallbackWrapper(base_callback: catalyst.core.callback.Callback, enable_callback: bool = True)[source]

Bases: catalyst.core.callback.Callback

Enable/disable callback execution.

Parameters
  • base_callback – callback to wrap

  • enable_callback – indicator to enable/disable callback, if True then callback will be enabled, default True

on_exception(runner: IRunner) → None[source]

Run base_callback (if possible)

Parameters

runner – current runner

Logger

class catalyst.core.logger.ILogger[source]

Bases: object

An abstraction that syncs experiment run with monitoring tools.

close_log() → None[source]

Closes the logger.

flush_log() → None[source]

Flushes the logger.

log_hparams(hparams: Dict, scope: str = None, run_key: str = None, stage_key: str = None) → None[source]

Logs hyperparameters to the logger.

log_image(tag: str, image: numpy.ndarray, scope: str = None, run_key: str = None, global_epoch_step: int = 0, global_batch_step: int = 0, global_sample_step: int = 0, stage_key: str = None, stage_epoch_len: int = 0, stage_epoch_step: int = 0, stage_batch_step: int = 0, stage_sample_step: int = 0, loader_key: str = None, loader_batch_len: int = 0, loader_sample_len: int = 0, loader_batch_step: int = 0, loader_sample_step: int = 0) → None[source]

Logs image to the logger.

log_metrics(metrics: Dict[str, float], scope: str = None, run_key: str = None, global_epoch_step: int = 0, global_batch_step: int = 0, global_sample_step: int = 0, stage_key: str = None, stage_epoch_len: int = 0, stage_epoch_step: int = 0, stage_batch_step: int = 0, stage_sample_step: int = 0, loader_key: str = None, loader_batch_len: int = 0, loader_sample_len: int = 0, loader_batch_step: int = 0, loader_sample_step: int = 0) → None[source]

Logs metrics to the logger.

Trial

class catalyst.core.trial.ITrial[source]

Bases: abc.ABC

An abstraction that syncs experiment run with different hyperparameter search systems.

Scripts

You can use Catalyst scripts with catalyst-dl in your terminal. For example:

$ catalyst-dl run --help