Core¶

Experiment
Runner
- RunnerLegacy
Callback

Experiment ¶

class catalyst.core.experiment.IExperiment[source]¶

Bases: abc.ABC

An abstraction that contains information about the experiment – a model, a criterion, an optimizer, a scheduler, and their hyperparameters. It also contains information about the data and transformations used. In general, the Experiment knows what you would like to run.

Note

To learn more about Catalyst Core concepts, please check out

catalyst.core.experiment.IExperiment

catalyst.core.runner.IRunner

catalyst.core.callback.Callback

Abstraction, please check out the implementations:

catalyst.experiments.experiment.Experiment

catalyst.experiments.config.ConfigExperiment

catalyst.experiments.supervised.SupervisedExperiment

abstract property distributed_params¶

Dictionary with the parameters for distributed and half-precision training.

Used in catalyst.utils.distributed.process_components to setup Nvidia Apex or PyTorch distributed.

Example:

>>> experiment.distributed_params
{"opt_level": "O1", "syncbn": True}  # Apex variant

abstract get_callbacks(stage: str) → OrderedDict[str, Callback][source]¶

Returns callbacks for a given stage.

Note

To learn more about Catalyst Callbacks mechanism, please follow catalyst.core.callback.Callback documentation.

Note

We need ordered dictionary to guarantee the correct dataflow and order of metrics optimization. For example, to compute loss before optimization, or to compute all the metrics before logging :)

Parameters: stage – stage name of interest like “pretrain” / “train” / “finetune” / etc

Returns: # noqa: DAR202: OrderedDict[str, Callback]: Ordered dictionary # noqa: DAR202 with callbacks for current stage.

Note

To learn more about Catalyst Core concepts, please check out

catalyst.core.experiment.IExperiment

catalyst.core.runner.IRunner

catalyst.core.callback.Callback

Parameters

stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc

Returns

Ordered dictionary: with callbacks for current stage.

Return type

OrderedDict[str, Callback]

abstract get_criterion(stage: str) → torch.nn.modules.module.Module[source]¶

Returns the criterion for a given stage.

Example:

# for typical classification task
>>> experiment.get_criterion(stage="training")
nn.CrossEntropyLoss()

Parameters: stage – stage name of interest like “pretrain” / “train” / “finetune” / etc

Returns: # noqa: DAR202: Criterion: criterion for a given stage.

get_datasets(stage: str, epoch: int = None, **kwargs) → OrderedDict[str, Dataset][source]¶

Returns the datasets for a given stage and epoch. # noqa: DAR401

Note

For Deep Learning cases you have the same dataset during whole stage.

For Reinforcement Learning it common to change the dataset (experiment) every training epoch.

Parameters

stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc
epoch – epoch index
**kwargs – additional parameters to use during dataset creation

Returns: # noqa: DAR202

OrderedDict[str, Dataset]: Ordered dictionary: with datasets for current stage and epoch.

Note

We need ordered dictionary to guarantee the correct dataflow and order of our training datasets. For example, to run through train data before validation one :)

Example:

>>> experiment.get_datasets(
>>>     stage="training",
>>>     in_csv_train="path/to/train/csv",
>>>     in_csv_valid="path/to/valid/csv",
>>> )
OrderedDict({
    "train": CsvDataset(in_csv=in_csv_train, ...),
    "valid": CsvDataset(in_csv=in_csv_valid, ...),
})

abstract get_loaders(stage: str, epoch: int = None) → OrderedDict[str, DataLoader][source]¶

Returns the loaders for a given stage. # noqa: DAR401

Note

Wrapper for catalyst.core.experiment.IExperiment.get_datasets. For most of your experiments you need to rewrite get_datasets method only.

Parameters

stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc
epoch – epoch index

Returns: # noqa: DAR202

OrderedDict[str, DataLoader]: Ordered dictionary: with loaders for current stage and epoch.

abstract get_model(stage: str) → torch.nn.modules.module.Module[source]¶

Returns the model for a given stage.

Example:

# suppose we have typical MNIST model, like
# nn.Sequential(nn.Linear(28*28, 128), nn.Linear(128, 10))
>>> experiment.get_model(stage="training")
Sequential(
 : Linear(in_features=784, out_features=128, bias=True)
 : Linear(in_features=128, out_features=10, bias=True)
)

Parameters: stage – stage name of interest like “pretrain” / “train” / “finetune” / etc

Returns: # noqa: DAR202: Model: model for a given stage.

abstract get_optimizer(stage: str, model: torch.nn.modules.module.Module) → torch.optim.optimizer.Optimizer[source]¶

Returns the optimizer for a given stage and model.

Example:

>>> experiment.get_optimizer(stage="training", model=model)
torch.optim.Adam(model.parameters())

Parameters

stage – stage name of interest like “pretrain” / “train” / “finetune” / etc
model – model to optimize with stage optimizer

Returns: # noqa: DAR202: Optimizer: optimizer for a given stage and model.

abstract get_scheduler(stage: str, optimizer: torch.optim.optimizer.Optimizer) → torch.optim.lr_scheduler._LRScheduler[source]¶

Returns the scheduler for a given stage and optimizer.

Example::

>>> experiment.get_scheduler(stage="training", optimizer=optimizer)
torch.optim.lr_scheduler.StepLR(optimizer)

Parameters

stage – stage name of interest like “pretrain” / “train” / “finetune” / etc
optimizer – optimizer to schedule with stage scheduler

Returns: # noqa: DAR202: Scheduler: scheduler for a given stage and optimizer.

abstract get_stage_params(stage: str) → Mapping[str, Any][source]¶

Returns extra stage parameters for a given stage.

Example:

>>> experiment.get_stage_params(stage="training")
{
    "logdir": "./logs/training",
    "num_epochs": 42,
    "valid_loader": "valid",
    "main_metric": "loss",
    "minimize_metric": True,
    "checkpoint_data": {
        "comment": "break the cycle - use the Catalyst"
    }
}

Parameters: stage – stage name of interest like “pretrain” / “train” / “finetune” / etc

Returns: # noqa: DAR202: dict: parameters for a given stage.

get_transforms(stage: str = None, dataset: str = None)[source]¶

Returns the data transforms for a given stage and dataset.

# noqa: DAR401, W505

Parameters

stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc
dataset – dataset name of interest, like “train” / “valid” / “infer”

Note

For datasets/loaders nameing please follow catalyst.core.runner documentation.

Returns: # noqa: DAR202: Data transformations to use for specified dataset.

abstract property hparams¶

Returns hyper-parameters for current experiment.

Example::

>>> experiment.hparams
OrderedDict([('optimizer', 'Adam'),
 ('lr', 0.02),
 ('betas', (0.9, 0.999)),
 ('eps', 1e-08),
 ('weight_decay', 0),
 ('amsgrad', False),
 ('train_batch_size', 32)])

abstract property initial_seed¶

Experiment’s initial seed, used to setup global seed at the beginning of each stage. Additionally, Catalyst Runner setups experiment.initial_seed + runner.global_epoch + 1 as global seed each epoch. Used for experiment reproducibility.

Example:

>>> experiment.initial_seed
42

abstract property logdir¶

Path to the directory where the experiment logs would be saved.

Example:

>>> experiment.logdir
./path/to/my/experiment/logs

abstract property stages¶

Experiment’s stage names.

Example:

>>> experiment.stages
["pretraining", "training", "finetuning"]

Note

To understand stages concept, please follow Catalyst documentation, for example, catalyst.core.callback.Callback

abstract property trial¶

Returns hyperparameter trial for current experiment. Could be usefull for Optuna/HyperOpt/Ray.tune hyperparameters optimizers.

Example:

>>> experiment.trial
optuna.trial._trial.Trial  # Optuna variant

class catalyst.core.experiment.IExperiment[source]

Bases: abc.ABC

An abstraction that contains information about the experiment – a model, a criterion, an optimizer, a scheduler, and their hyperparameters. It also contains information about the data and transformations used. In general, the Experiment knows what you would like to run.

Note

To learn more about Catalyst Core concepts, please check out

catalyst.core.experiment.IExperiment

catalyst.core.runner.IRunner

catalyst.core.callback.Callback

Abstraction, please check out the implementations:

catalyst.experiments.experiment.Experiment

catalyst.experiments.config.ConfigExperiment

catalyst.experiments.supervised.SupervisedExperiment

abstract property distributed_params

Dictionary with the parameters for distributed and half-precision training.

Used in catalyst.utils.distributed.process_components to setup Nvidia Apex or PyTorch distributed.

Example:

>>> experiment.distributed_params
{"opt_level": "O1", "syncbn": True}  # Apex variant

abstract get_callbacks(stage: str) → OrderedDict[str, Callback][source]

Returns callbacks for a given stage.

Note

To learn more about Catalyst Callbacks mechanism, please follow catalyst.core.callback.Callback documentation.

Note

We need ordered dictionary to guarantee the correct dataflow and order of metrics optimization. For example, to compute loss before optimization, or to compute all the metrics before logging :)

Parameters: stage – stage name of interest like “pretrain” / “train” / “finetune” / etc

Returns: # noqa: DAR202: OrderedDict[str, Callback]: Ordered dictionary # noqa: DAR202 with callbacks for current stage.

Note

To learn more about Catalyst Core concepts, please check out

catalyst.core.experiment.IExperiment

catalyst.core.runner.IRunner

catalyst.core.callback.Callback

Parameters

stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc

Returns

Ordered dictionary: with callbacks for current stage.

Return type

OrderedDict[str, Callback]

abstract get_criterion(stage: str) → torch.nn.modules.module.Module[source]

Returns the criterion for a given stage.

Example:

# for typical classification task
>>> experiment.get_criterion(stage="training")
nn.CrossEntropyLoss()

Parameters: stage – stage name of interest like “pretrain” / “train” / “finetune” / etc

Returns: # noqa: DAR202: Criterion: criterion for a given stage.

get_datasets(stage: str, epoch: int = None, **kwargs) → OrderedDict[str, Dataset][source]

Returns the datasets for a given stage and epoch. # noqa: DAR401

Note

For Deep Learning cases you have the same dataset during whole stage.

For Reinforcement Learning it common to change the dataset (experiment) every training epoch.

Parameters

stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc
epoch – epoch index
**kwargs – additional parameters to use during dataset creation

Returns: # noqa: DAR202

OrderedDict[str, Dataset]: Ordered dictionary: with datasets for current stage and epoch.

Note

We need ordered dictionary to guarantee the correct dataflow and order of our training datasets. For example, to run through train data before validation one :)

Example:

>>> experiment.get_datasets(
>>>     stage="training",
>>>     in_csv_train="path/to/train/csv",
>>>     in_csv_valid="path/to/valid/csv",
>>> )
OrderedDict({
    "train": CsvDataset(in_csv=in_csv_train, ...),
    "valid": CsvDataset(in_csv=in_csv_valid, ...),
})

abstract get_loaders(stage: str, epoch: int = None) → OrderedDict[str, DataLoader][source]

Returns the loaders for a given stage. # noqa: DAR401

Note

Wrapper for catalyst.core.experiment.IExperiment.get_datasets. For most of your experiments you need to rewrite get_datasets method only.

Parameters

stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc
epoch – epoch index

Returns: # noqa: DAR202

OrderedDict[str, DataLoader]: Ordered dictionary: with loaders for current stage and epoch.

abstract get_model(stage: str) → torch.nn.modules.module.Module[source]

Returns the model for a given stage.

Example:

# suppose we have typical MNIST model, like
# nn.Sequential(nn.Linear(28*28, 128), nn.Linear(128, 10))
>>> experiment.get_model(stage="training")
Sequential(
 : Linear(in_features=784, out_features=128, bias=True)
 : Linear(in_features=128, out_features=10, bias=True)
)

Parameters: stage – stage name of interest like “pretrain” / “train” / “finetune” / etc

Returns: # noqa: DAR202: Model: model for a given stage.

abstract get_optimizer(stage: str, model: torch.nn.modules.module.Module) → torch.optim.optimizer.Optimizer[source]

Returns the optimizer for a given stage and model.

Example:

>>> experiment.get_optimizer(stage="training", model=model)
torch.optim.Adam(model.parameters())

Parameters

stage – stage name of interest like “pretrain” / “train” / “finetune” / etc
model – model to optimize with stage optimizer

Returns: # noqa: DAR202: Optimizer: optimizer for a given stage and model.

abstract get_scheduler(stage: str, optimizer: torch.optim.optimizer.Optimizer) → torch.optim.lr_scheduler._LRScheduler[source]

Returns the scheduler for a given stage and optimizer.

Example::

>>> experiment.get_scheduler(stage="training", optimizer=optimizer)
torch.optim.lr_scheduler.StepLR(optimizer)

Parameters

stage – stage name of interest like “pretrain” / “train” / “finetune” / etc
optimizer – optimizer to schedule with stage scheduler

Returns: # noqa: DAR202: Scheduler: scheduler for a given stage and optimizer.

abstract get_stage_params(stage: str) → Mapping[str, Any][source]

Returns extra stage parameters for a given stage.

Example:

>>> experiment.get_stage_params(stage="training")
{
    "logdir": "./logs/training",
    "num_epochs": 42,
    "valid_loader": "valid",
    "main_metric": "loss",
    "minimize_metric": True,
    "checkpoint_data": {
        "comment": "break the cycle - use the Catalyst"
    }
}

Parameters: stage – stage name of interest like “pretrain” / “train” / “finetune” / etc

Returns: # noqa: DAR202: dict: parameters for a given stage.

get_transforms(stage: str = None, dataset: str = None)[source]

Returns the data transforms for a given stage and dataset.

# noqa: DAR401, W505

Parameters

stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc
dataset – dataset name of interest, like “train” / “valid” / “infer”

Note

For datasets/loaders nameing please follow catalyst.core.runner documentation.

Returns: # noqa: DAR202: Data transformations to use for specified dataset.

abstract property hparams

Returns hyper-parameters for current experiment.

Example::

>>> experiment.hparams
OrderedDict([('optimizer', 'Adam'),
 ('lr', 0.02),
 ('betas', (0.9, 0.999)),
 ('eps', 1e-08),
 ('weight_decay', 0),
 ('amsgrad', False),
 ('train_batch_size', 32)])

abstract property initial_seed

Experiment’s initial seed, used to setup global seed at the beginning of each stage. Additionally, Catalyst Runner setups experiment.initial_seed + runner.global_epoch + 1 as global seed each epoch. Used for experiment reproducibility.

Example:

>>> experiment.initial_seed
42

abstract property logdir

Path to the directory where the experiment logs would be saved.

Example:

>>> experiment.logdir
./path/to/my/experiment/logs

abstract property stages

Experiment’s stage names.

Example:

>>> experiment.stages
["pretraining", "training", "finetuning"]

Note

To understand stages concept, please follow Catalyst documentation, for example, catalyst.core.callback.Callback

abstract property trial

Returns hyperparameter trial for current experiment. Could be usefull for Optuna/HyperOpt/Ray.tune hyperparameters optimizers.

Example:

>>> experiment.trial
optuna.trial._trial.Trial  # Optuna variant

Runner ¶

class catalyst.core.runner.RunnerException(message: str)[source]¶

Bases: Exception

Exception class for all runner errors.

__init__(message: str)[source]¶

Parameters: message – exception message

class catalyst.core.runner.IRunner(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None)[source]¶

Bases: abc.ABC, catalyst.core.callback.ICallback, catalyst.core.legacy.IRunnerLegacy

An abstraction that knows how to run an experiment. It contains all the logic of how to run the experiment, stages, epoch and batches.

Note

To learn more about Catalyst Core concepts, please check out

catalyst.core.experiment.IExperiment

catalyst.core.runner.IRunner

catalyst.core.callback.Callback

Abstraction, please check out the implementations:

catalyst.runners.runner.Runner

catalyst.runners.supervised.SupervisedRunner

Runner also contains full information about experiment runner.

Runner section

runner.model - an instance of torch.nn.Module class, (should implement forward method); for example,

runner.model = torch.nn.Linear(10, 10)

runner.device - an instance of torch.device (CPU, GPU, TPU); for example,

runner.device = torch.device("cpu")

Experiment section

runner.criterion - an instance of torch.nn.Module class or torch.nn.modules.loss._Loss (should implement forward method); for example,

runner.criterion = torch.nn.CrossEntropyLoss()

runner.optimizer - an instance of torch.optim.optimizer.Optimizer (should implement step method); for example,

runner.optimizer = torch.optim.Adam()

runner.scheduler - an instance of torch.optim.lr_scheduler._LRScheduler (should implement step method); for example,

runner.scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau()

runner.callbacks - ordered dictionary with Catalyst.Callback instances; for example,

runner.callbacks = {
    "accuracy": AccuracyCallback(),
    "criterion": CriterionCallback(),
    "optim": OptimizerCallback(),
    "saver": CheckpointCallback()
}

Dataflow section

runner.loaders - ordered dictionary with torch.DataLoaders; for example,

runner.loaders = {
    "train": MnistTrainLoader(),
    "valid": MnistValidLoader()
}

Note

“train” prefix is used for training loaders - metrics computations, backward pass, optimization
“valid” prefix is used for validation loaders - metrics computations only
“infer” prefix is used for inference loaders - dataset prediction

runner.input - dictionary, containing batch of data from currents DataLoader; for example,

runner.input = {
    "images": np.ndarray(batch_size, c, h, w),
    "targets": np.ndarray(batch_size, 1),
}

runner.output - dictionary, containing model output for current batch; for example,

runner.output = {"logits": torch.Tensor(batch_size, num_classes)}

Metrics section

runner.batch_metrics - dictionary, flatten storage for batch metrics; for example,

runner.batch_metrics = {"loss": ..., "accuracy": ..., "iou": ...}

runner.loader_metrics - dictionary with aggregated batch statistics for loader (mean over all batches) and global loader metrics, like AUC; for example,

runner.loader_metrics = {"loss": ..., "accuracy": ..., "auc": ...}

runner.epoch_metrics - dictionary with summarized metrics for different loaders and global epoch metrics, like lr, momentum; for example,

runner.epoch_metrics = {
    "train_loss": ..., "train_auc": ..., "valid_loss": ...,
    "lr": ..., "momentum": ...,
}

Validation metrics section

runner.main_metric - string, containing name of metric of interest for optimization, validation and checkpointing during training

runner.minimize_metric - bool, indicator flag

True if we need to minimize metric during training, like Cross Entropy loss

False if we need to maximize metric during training, like Accuracy or Intersection over Union

Validation section

runner.valid_loader - string, name of validation loader for metric selection, validation and model checkpoining

runner.valid_metrics - dictionary with validation metrics for currect epoch; for example,

runner.valid_metrics = {"loss": ..., "accuracy": ..., "auc": ...}

Note

subdictionary of epoch_metrics

runner.is_best_valid - bool, indicator flag

True if this training epoch is best over all epochs

False if not

runner.best_valid_metrics - dictionary with best validation metrics during whole training process

Distributed section

runner.distributed_rank - distributed rank of current worker

runner.is_distributed_master - bool, indicator flag

True if is master node (runner.distributed_rank == 0)

False if is worker node (runner.distributed_rank != 0)

runner.is_distributed_worker - bool, indicator flag

True if is worker node (runner.distributed_rank > 0)

False if is master node (runner.distributed_rank <= 0)

Experiment info section

runner.global_sample_step - int, numerical indicator, counter for all individual samples, that passes through our model during training, validation and inference stages

runner.global_batch_step - int, numerical indicator, counter for all batches, that passes through our model during training, validation and inference stages

runner.global_epoch - int, numerical indicator, counter for all epochs, that have passed during model training, validation and inference stages

runner.verbose - bool, indicator flag

runner.is_check_run - bool, indicator flag

True if you want to check you pipeline and run only 2 batches per loader and 2 epochs per stage

False (default) if you want to just the pipeline

runner.need_early_stop - bool, indicator flag used for EarlyStopping and CheckRun Callbacks

True if we need to stop the training

False (default) otherwise

runner.need_exception_reraise - bool, indicator flag

True (default) if you want to show exception during pipeline and stop the training process

False otherwise

Stage info section

runner.stage - string, current stage name, for example,

runner.stage = "pretraining" / "training" / "finetuning" / etc

runner.num_epochs - int, maximum number of epochs, required for this stage

runner.is_infer_stage - bool, indicator flag

True for inference stages

False otherwise

Epoch info section

runner.epoch - int, numerical indicator for current stage epoch

Loader info section

runner.loader_sample_step - int, numerical indicator for number of samples passed through our model in current loader

runner.loader_batch_step - int, numerical indicator for batch index in current loader

runner.loader_name - string, current loader name for example,

runner.loader_name = "train_dataset1" / "valid_data2" / "infer_golden"

runner.loader_len - int, maximum number of batches in current loader

runner.loader_batch_size - int, batch size parameter in current loader

runner.is_train_loader - bool, indicator flag

True for training loaders

False otherwise

runner.is_valid_loader - bool, indicator flag

True for validation loaders

False otherwise

runner.is_infer_loader - bool, indicator flag

True for inference loaders

False otherwise

Batch info section

runner.batch_size - int, length of the current batch

Logging section

runner.logdir - string, path to logging directory to save all logs, metrics, checkpoints and artifacts

runner.checkpoint_data - dictionary with all extra data for experiment tracking

Extra section

runner.exception - python Exception instance to raise (or not ;) )

__init__(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None)[source]¶

Parameters

model – Torch model object
device – Torch device

property device¶: Returns the runner’s device instance.

property model¶: Returns the runner’s model instance.

on_batch_end(runner: catalyst.core.runner.IRunner)[source]¶

Event handler for batch end.

Parameters: runner – IRunner instance.

on_batch_start(runner: catalyst.core.runner.IRunner)[source]¶

Event handler for batch start.

Parameters: runner – IRunner instance.

on_epoch_end(runner: catalyst.core.runner.IRunner)[source]¶

Event handler for epoch end.

Parameters: runner – IRunner instance.

on_epoch_start(runner: catalyst.core.runner.IRunner)[source]¶

Event handler for epoch start.

Parameters: runner – IRunner instance.
Raises: RunnerException – if current DataLoader is empty.

on_exception(runner: catalyst.core.runner.IRunner)[source]¶

Event handler for exception case.

Parameters: runner – IRunner instance.
Raises: exception – if during pipeline exception, no handler we found into callbacks

on_experiment_end(runner: catalyst.core.runner.IRunner)[source]¶

Event handler for experiment end.

Parameters: runner – IRunner instance.

Note

This event work only on IRunner.

on_experiment_start(runner: catalyst.core.runner.IRunner)[source]¶

Event handler for experiment start.

Parameters: runner – IRunner instance.

Note

This event work only on IRunner.

on_loader_end(runner: catalyst.core.runner.IRunner)[source]¶

Event handler for loader end.

Parameters: runner – IRunner instance.

on_loader_start(runner: catalyst.core.runner.IRunner)[source]¶

Event handler for loader start.

Parameters: runner – IRunner instance.
Raises: RunnerException – if current DataLoader is empty.

on_stage_end(runner: catalyst.core.runner.IRunner)[source]¶

Event handler for stage end.

Parameters: runner – IRunner instance.

on_stage_start(runner: catalyst.core.runner.IRunner)[source]¶

Event handler for stage start.

Parameters: runner – IRunner instance.

run_experiment(experiment: catalyst.core.experiment.IExperiment = None) → catalyst.core.runner.IRunner[source]¶

Starts the experiment.

Parameters: experiment – Experiment instance to use for Runner.
Returns: self, IRunner instance after the experiment

class catalyst.core.runner.IStageBasedRunner(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None)[source]¶

Bases: catalyst.core.runner.IRunner

Runner abstraction that suppose to have constant datasources per stage.

on_stage_start(runner: catalyst.core.runner.IRunner) → None[source]¶

Event handler for stage start.

For the IStageBasedRunner case:

prepares loaders - our datasources
prepares model components - model, criterion, optimizer, scheduler
prepares callbacks for the current stage

Parameters: runner – IRunner instance.

class catalyst.core.runner.IRunner(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None)[source]

Bases: abc.ABC, catalyst.core.callback.ICallback, catalyst.core.legacy.IRunnerLegacy

An abstraction that knows how to run an experiment. It contains all the logic of how to run the experiment, stages, epoch and batches.

Note

To learn more about Catalyst Core concepts, please check out

catalyst.core.experiment.IExperiment

catalyst.core.runner.IRunner

catalyst.core.callback.Callback

Abstraction, please check out the implementations:

catalyst.runners.runner.Runner

catalyst.runners.supervised.SupervisedRunner

Runner also contains full information about experiment runner.

Runner section

runner.model - an instance of torch.nn.Module class, (should implement forward method); for example,

runner.model = torch.nn.Linear(10, 10)

runner.device - an instance of torch.device (CPU, GPU, TPU); for example,

runner.device = torch.device("cpu")

Experiment section

runner.criterion - an instance of torch.nn.Module class or torch.nn.modules.loss._Loss (should implement forward method); for example,

runner.criterion = torch.nn.CrossEntropyLoss()

runner.optimizer - an instance of torch.optim.optimizer.Optimizer (should implement step method); for example,

runner.optimizer = torch.optim.Adam()

runner.scheduler - an instance of torch.optim.lr_scheduler._LRScheduler (should implement step method); for example,

runner.scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau()

runner.callbacks - ordered dictionary with Catalyst.Callback instances; for example,

runner.callbacks = {
    "accuracy": AccuracyCallback(),
    "criterion": CriterionCallback(),
    "optim": OptimizerCallback(),
    "saver": CheckpointCallback()
}

Dataflow section

runner.loaders - ordered dictionary with torch.DataLoaders; for example,

runner.loaders = {
    "train": MnistTrainLoader(),
    "valid": MnistValidLoader()
}

Note

“train” prefix is used for training loaders - metrics computations, backward pass, optimization
“valid” prefix is used for validation loaders - metrics computations only
“infer” prefix is used for inference loaders - dataset prediction

runner.input - dictionary, containing batch of data from currents DataLoader; for example,

runner.input = {
    "images": np.ndarray(batch_size, c, h, w),
    "targets": np.ndarray(batch_size, 1),
}

runner.output - dictionary, containing model output for current batch; for example,

runner.output = {"logits": torch.Tensor(batch_size, num_classes)}

Metrics section

runner.batch_metrics - dictionary, flatten storage for batch metrics; for example,

runner.batch_metrics = {"loss": ..., "accuracy": ..., "iou": ...}

runner.loader_metrics - dictionary with aggregated batch statistics for loader (mean over all batches) and global loader metrics, like AUC; for example,

runner.loader_metrics = {"loss": ..., "accuracy": ..., "auc": ...}

runner.epoch_metrics - dictionary with summarized metrics for different loaders and global epoch metrics, like lr, momentum; for example,

runner.epoch_metrics = {
    "train_loss": ..., "train_auc": ..., "valid_loss": ...,
    "lr": ..., "momentum": ...,
}

Validation metrics section

runner.main_metric - string, containing name of metric of interest for optimization, validation and checkpointing during training

runner.minimize_metric - bool, indicator flag

True if we need to minimize metric during training, like Cross Entropy loss

False if we need to maximize metric during training, like Accuracy or Intersection over Union

Validation section

runner.valid_loader - string, name of validation loader for metric selection, validation and model checkpoining

runner.valid_metrics - dictionary with validation metrics for currect epoch; for example,

runner.valid_metrics = {"loss": ..., "accuracy": ..., "auc": ...}

Note

subdictionary of epoch_metrics

runner.is_best_valid - bool, indicator flag

True if this training epoch is best over all epochs

False if not

runner.best_valid_metrics - dictionary with best validation metrics during whole training process

Distributed section

runner.distributed_rank - distributed rank of current worker

runner.is_distributed_master - bool, indicator flag

True if is master node (runner.distributed_rank == 0)

False if is worker node (runner.distributed_rank != 0)

runner.is_distributed_worker - bool, indicator flag

True if is worker node (runner.distributed_rank > 0)

False if is master node (runner.distributed_rank <= 0)

Experiment info section

runner.global_sample_step - int, numerical indicator, counter for all individual samples, that passes through our model during training, validation and inference stages

runner.global_batch_step - int, numerical indicator, counter for all batches, that passes through our model during training, validation and inference stages

runner.global_epoch - int, numerical indicator, counter for all epochs, that have passed during model training, validation and inference stages

runner.verbose - bool, indicator flag

runner.is_check_run - bool, indicator flag

True if you want to check you pipeline and run only 2 batches per loader and 2 epochs per stage

False (default) if you want to just the pipeline

runner.need_early_stop - bool, indicator flag used for EarlyStopping and CheckRun Callbacks

True if we need to stop the training

False (default) otherwise

runner.need_exception_reraise - bool, indicator flag

True (default) if you want to show exception during pipeline and stop the training process

False otherwise

Stage info section

runner.stage - string, current stage name, for example,

runner.stage = "pretraining" / "training" / "finetuning" / etc

runner.num_epochs - int, maximum number of epochs, required for this stage

runner.is_infer_stage - bool, indicator flag

True for inference stages

False otherwise

Epoch info section

runner.epoch - int, numerical indicator for current stage epoch

Loader info section

runner.loader_sample_step - int, numerical indicator for number of samples passed through our model in current loader

runner.loader_batch_step - int, numerical indicator for batch index in current loader

runner.loader_name - string, current loader name for example,

runner.loader_name = "train_dataset1" / "valid_data2" / "infer_golden"

runner.loader_len - int, maximum number of batches in current loader

runner.loader_batch_size - int, batch size parameter in current loader

runner.is_train_loader - bool, indicator flag

True for training loaders

False otherwise

runner.is_valid_loader - bool, indicator flag

True for validation loaders

False otherwise

runner.is_infer_loader - bool, indicator flag

True for inference loaders

False otherwise

Batch info section

runner.batch_size - int, length of the current batch

Logging section

runner.logdir - string, path to logging directory to save all logs, metrics, checkpoints and artifacts

runner.checkpoint_data - dictionary with all extra data for experiment tracking

Extra section

runner.exception - python Exception instance to raise (or not ;) )

__init__(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None)[source]

Parameters

model – Torch model object
device – Torch device

property device: Returns the runner’s device instance.

property model: Returns the runner’s model instance.

on_batch_end(runner: catalyst.core.runner.IRunner)[source]

Event handler for batch end.

Parameters: runner – IRunner instance.

on_batch_start(runner: catalyst.core.runner.IRunner)[source]

Event handler for batch start.

Parameters: runner – IRunner instance.

on_epoch_end(runner: catalyst.core.runner.IRunner)[source]

Event handler for epoch end.

Parameters: runner – IRunner instance.

on_epoch_start(runner: catalyst.core.runner.IRunner)[source]

Event handler for epoch start.

Parameters: runner – IRunner instance.
Raises: RunnerException – if current DataLoader is empty.

on_exception(runner: catalyst.core.runner.IRunner)[source]

Event handler for exception case.

Parameters: runner – IRunner instance.
Raises: exception – if during pipeline exception, no handler we found into callbacks

on_experiment_end(runner: catalyst.core.runner.IRunner)[source]

Event handler for experiment end.

Parameters: runner – IRunner instance.

Note

This event work only on IRunner.

on_experiment_start(runner: catalyst.core.runner.IRunner)[source]

Event handler for experiment start.

Parameters: runner – IRunner instance.

Note

This event work only on IRunner.

on_loader_end(runner: catalyst.core.runner.IRunner)[source]

Event handler for loader end.

Parameters: runner – IRunner instance.

on_loader_start(runner: catalyst.core.runner.IRunner)[source]

Event handler for loader start.

Parameters: runner – IRunner instance.
Raises: RunnerException – if current DataLoader is empty.

on_stage_end(runner: catalyst.core.runner.IRunner)[source]

Event handler for stage end.

Parameters: runner – IRunner instance.

on_stage_start(runner: catalyst.core.runner.IRunner)[source]

Event handler for stage start.

Parameters: runner – IRunner instance.

run_experiment(experiment: catalyst.core.experiment.IExperiment = None) → catalyst.core.runner.IRunner[source]

Starts the experiment.

Parameters: experiment – Experiment instance to use for Runner.
Returns: self, IRunner instance after the experiment

class catalyst.core.runner.IStageBasedRunner(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None)[source]

Bases: catalyst.core.runner.IRunner

Runner abstraction that suppose to have constant datasources per stage.

on_stage_start(runner: catalyst.core.runner.IRunner) → None[source]

Event handler for stage start.

For the IStageBasedRunner case:

prepares loaders - our datasources
prepares model components - model, criterion, optimizer, scheduler
prepares callbacks for the current stage

Parameters: runner – IRunner instance.

exception catalyst.core.runner.RunnerException(message: str)[source]

Bases: Exception

Exception class for all runner errors.

__init__(message: str)[source]

Parameters: message – exception message

RunnerLegacy ¶

class catalyst.core.legacy.IRunnerLegacy[source]¶

Bases: object

Special class to encapsulate all catalyst.core.runner.IRunner and catalyst.core.runner.State legacy into one place. Used to make catalyst.core.runner.IRunner cleaner and easier to understand.

Saved for backward compatibility. Should be removed someday.

property batch_in¶: Alias for runner.input.

Warning

Deprecated, saved for backward compatibility. Please use runner.input instead.

property batch_out¶: Alias for runner.output.

Warning

Deprecated, saved for backward compatibility. Please use runner.output instead.

property loader_name¶: Alias for runner.loader_key.

Warning

Deprecated, saved for backward compatibility. Please use runner.loader_key instead.

property loader_step¶: Alias for runner.loader_batch_step.

Warning

Deprecated, saved for backward compatibility. Please use runner.loader_batch_step instead.

property need_backward_pass¶: Alias for runner.is_train_loader.

Warning

Deprecated, saved for backward compatibility. Please use runner.is_train_loader instead.

property stage_name¶: Alias for runner.stage.

Warning

Deprecated, saved for backward compatibility. Please use runner.stage instead.

property state¶: Alias for runner.

Warning

Deprecated, saved for backward compatibility. Please use runner instead.

Callback ¶

class catalyst.core.callback.CallbackNode[source]¶

Bases: enum.IntFlag

Callback node usage flag during distributed training.

All (0) - use on all nodes, botch master and worker.
Master (1) - use only on master node.
Worker (2) - use only in worker nodes.

All = 0¶

Master = 1¶

Worker = 2¶

all = 0¶

master = 1¶

worker = 2¶

class catalyst.core.callback.CallbackOrder[source]¶

Bases: enum.IntFlag

Callback usage order during training.

Catalyst executes Callbacks with low CallbackOrder before Callbacks with high CallbackOrder.

Predefined orders:

Internal (0) - some Catalyst Extras, like PhaseCallbacks (used in GANs).
Metric (20) - Callbacks with metrics and losses computation.
MetricAggregation (40) - metrics aggregation callbacks, like sum different losses into one.
Optimizer (60) - optimizer step, requires computed metrics for optimization.
Validation (80) - validation step, computes validation metrics subset based on all metrics.
Scheduler (100) - scheduler step, in ReduceLROnPlateau case requires computed validation metrics for optimizer schedule.
Logging (120) - logging step, logs metrics to Console/Tensorboard/Alchemy, requires computed metrics.
External (200) - additional callbacks with custom logic, like InferenceCallbacks

Nevertheless, you always can create CustomCallback with any order, for example:

>>> class MyCustomCallback(Callback):
>>>     def __init__(self):
>>>         super().__init__(order=42)
>>>     ...
# MyCustomCallback will be executed after all `Metric`-Callbacks
# but before all `MetricAggregation`-Callbacks.

External = 200¶

Internal = 0¶

Logging = 120¶

Metric = 20¶

MetricAggregation = 40¶

Optimizer = 60¶

Scheduler = 100¶

Validation = 80¶

external = 200¶

internal = 0¶

logging = 120¶

metric = 20¶

metric_aggregation = 40¶

optimizer = 60¶

scheduler = 100¶

validation = 80¶

class catalyst.core.callback.CallbackScope[source]¶

Bases: enum.IntFlag

Callback scope usage flag during training.

Stage (0) - use Callback only during one experiment stage.
Experiment (1) - use Callback during whole experiment run.

Experiment = 1¶

Stage = 0¶

experiment = 1¶

stage = 0¶

class catalyst.core.callback.Callback(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]¶

Bases: catalyst.core.callback.ICallback

An abstraction that lets you customize your experiment run logic. To give users maximum flexibility and extensibility Catalyst supports callback execution anywhere in the training loop:

-- stage start
---- epoch start
------ loader start
-------- batch start
---------- batch handler (Runner logic)
-------- batch end
------ loader end
---- epoch end
-- stage end

exception – if an Exception was raised

All callbacks have

order from CallbackOrder
node from CallbackNode
scope from CallbackScope

Note

To learn more about Catalyst Core concepts, please check out

catalyst.core.experiment.IExperiment

catalyst.core.runner.IRunner

catalyst.core.callback.Callback

Abstraction, please check out the implementations:

catalyst.callbacks.criterion.CriterionCallback

catalyst.callbacks.optimizer.OptimizerCallback

catalyst.callbacks.scheduler.SchedulerCallback

catalyst.callbacks.logging.TensorboardLogger

catalyst.callbacks.checkpoint.CheckpointCallback

__init__(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]¶

Callback initializer.

Parameters

order – flag from CallbackOrder
node – flag from CallbackNode
scope – flag from CallbackScope

class catalyst.core.callback.CallbackWrapper(base_callback: catalyst.core.callback.Callback, enable_callback: bool = True)[source]¶

Bases: catalyst.core.callback.Callback

Enable/disable callback execution.

__init__(base_callback: catalyst.core.callback.Callback, enable_callback: bool = True)[source]¶

Parameters

base_callback – callback to wrap
enable_callback – indicator to enable/disable callback, if True then callback will be enabled, default True

on_batch_end(runner: IRunner) → None[source]¶

Run base_callback (if possible)

Parameters: runner – current runner

on_batch_start(runner: IRunner) → None[source]¶

Run base_callback (if possible)

Parameters: runner – current runner

on_epoch_end(runner: IRunner) → None[source]¶

Run base_callback (if possible)

Parameters: runner – current runner

on_epoch_start(runner: IRunner) → None[source]¶

Run base_callback (if possible)

Parameters: runner – current runner

on_exception(runner: IRunner) → None[source]¶

Run base_callback (if possible)

Parameters: runner – current runner

on_loader_end(runner: IRunner) → None[source]¶

Reset status of callback

Parameters: runner – current runner

on_loader_start(runner: IRunner) → None[source]¶

Check if current epoch should be skipped.

Parameters: runner – current runner

on_stage_end(runner: IRunner) → None[source]¶

Run base_callback (if possible)

Parameters: runner – current runner

on_stage_start(runner: IRunner) → None[source]¶

Run base_callback (if possible)

Parameters: runner – current runner

class catalyst.core.callback.Callback(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]

Bases: catalyst.core.callback.ICallback

An abstraction that lets you customize your experiment run logic. To give users maximum flexibility and extensibility Catalyst supports callback execution anywhere in the training loop:

-- stage start
---- epoch start
------ loader start
-------- batch start
---------- batch handler (Runner logic)
-------- batch end
------ loader end
---- epoch end
-- stage end

exception – if an Exception was raised

All callbacks have

order from CallbackOrder
node from CallbackNode
scope from CallbackScope

Note

To learn more about Catalyst Core concepts, please check out

catalyst.core.experiment.IExperiment

catalyst.core.runner.IRunner

catalyst.core.callback.Callback

Abstraction, please check out the implementations:

catalyst.callbacks.criterion.CriterionCallback

catalyst.callbacks.optimizer.OptimizerCallback

catalyst.callbacks.scheduler.SchedulerCallback

catalyst.callbacks.logging.TensorboardLogger

catalyst.callbacks.checkpoint.CheckpointCallback

__init__(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]

Callback initializer.

Parameters

order – flag from CallbackOrder
node – flag from CallbackNode
scope – flag from CallbackScope

class catalyst.core.callback.CallbackNode[source]

Bases: enum.IntFlag

Callback node usage flag during distributed training.

All (0) - use on all nodes, botch master and worker.
Master (1) - use only on master node.
Worker (2) - use only in worker nodes.

All = 0

Master = 1

Worker = 2

all = 0

master = 1

worker = 2

class catalyst.core.callback.CallbackOrder[source]

Bases: enum.IntFlag

Callback usage order during training.

Catalyst executes Callbacks with low CallbackOrder before Callbacks with high CallbackOrder.

Predefined orders:

Internal (0) - some Catalyst Extras, like PhaseCallbacks (used in GANs).
Metric (20) - Callbacks with metrics and losses computation.
MetricAggregation (40) - metrics aggregation callbacks, like sum different losses into one.
Optimizer (60) - optimizer step, requires computed metrics for optimization.
Validation (80) - validation step, computes validation metrics subset based on all metrics.
Scheduler (100) - scheduler step, in ReduceLROnPlateau case requires computed validation metrics for optimizer schedule.
Logging (120) - logging step, logs metrics to Console/Tensorboard/Alchemy, requires computed metrics.
External (200) - additional callbacks with custom logic, like InferenceCallbacks

Nevertheless, you always can create CustomCallback with any order, for example:

>>> class MyCustomCallback(Callback):
>>>     def __init__(self):
>>>         super().__init__(order=42)
>>>     ...
# MyCustomCallback will be executed after all `Metric`-Callbacks
# but before all `MetricAggregation`-Callbacks.

External = 200

Internal = 0

Logging = 120

Metric = 20

MetricAggregation = 40

Optimizer = 60

Scheduler = 100

Validation = 80

external = 200

internal = 0

logging = 120

metric = 20

metric_aggregation = 40

optimizer = 60

scheduler = 100

validation = 80

class catalyst.core.callback.CallbackScope[source]

Bases: enum.IntFlag

Callback scope usage flag during training.

Stage (0) - use Callback only during one experiment stage.
Experiment (1) - use Callback during whole experiment run.

Experiment = 1

Stage = 0

experiment = 1

stage = 0

class catalyst.core.callback.CallbackWrapper(base_callback: catalyst.core.callback.Callback, enable_callback: bool = True)[source]

Bases: catalyst.core.callback.Callback

Enable/disable callback execution.

__init__(base_callback: catalyst.core.callback.Callback, enable_callback: bool = True)[source]

Parameters

base_callback – callback to wrap
enable_callback – indicator to enable/disable callback, if True then callback will be enabled, default True

on_batch_end(runner: IRunner) → None[source]

Run base_callback (if possible)

Parameters: runner – current runner

on_batch_start(runner: IRunner) → None[source]

Run base_callback (if possible)

Parameters: runner – current runner

on_epoch_end(runner: IRunner) → None[source]

Run base_callback (if possible)

Parameters: runner – current runner

on_epoch_start(runner: IRunner) → None[source]

Run base_callback (if possible)

Parameters: runner – current runner

on_exception(runner: IRunner) → None[source]

Run base_callback (if possible)

Parameters: runner – current runner

on_loader_end(runner: IRunner) → None[source]

Reset status of callback

Parameters: runner – current runner

on_loader_start(runner: IRunner) → None[source]

Check if current epoch should be skipped.

Parameters: runner – current runner

on_stage_end(runner: IRunner) → None[source]

Run base_callback (if possible)

Parameters: runner – current runner

on_stage_start(runner: IRunner) → None[source]

Run base_callback (if possible)

Parameters: runner – current runner

Core¶

Experiment¶

Runner¶

RunnerLegacy¶

Callback¶

Experiment ¶

Runner ¶

RunnerLegacy ¶

Callback ¶