Core¶
Experiment¶
- 
class catalyst.core.experiment.IExperiment[source]¶
- Bases: - abc.ABC- An abstraction that contains information about the experiment – a model, a criterion, an optimizer, a scheduler, and their hyperparameters. It also contains information about the data and transformations used. In general, the Experiment knows what you would like to run. - Note - To learn more about Catalyst Core concepts, please check out - Abstraction, please check out the implementations: - 
abstract property distributed_params¶
- Dictionary with the parameters for distributed and half-precision training. - Used in - catalyst.utils.distributed.process_componentsto setup Nvidia Apex or PyTorch distributed.- Example: - >>> experiment.distributed_params {"opt_level": "O1", "syncbn": True} # Apex variant 
 - 
abstract get_callbacks(stage: str) → OrderedDict[str, Callback][source]¶
- Returns callbacks for a given stage. - Note - To learn more about Catalyst Callbacks mechanism, please follow - catalyst.core.callback.Callbackdocumentation.- Note - We need ordered dictionary to guarantee the correct dataflow and order of metrics optimization. For example, to compute loss before optimization, or to compute all the metrics before logging :) - Parameters
- stage – stage name of interest like “pretrain” / “train” / “finetune” / etc 
 - Returns: # noqa: DAR202
- OrderedDict[str, Callback]: Ordered dictionary # noqa: DAR202 with callbacks for current stage. 
 - Note - To learn more about Catalyst Core concepts, please check out - Parameters
- stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc 
- Returns
- Ordered dictionary
- with callbacks for current stage. 
 
- Return type
- OrderedDict[str, Callback] 
 
 - 
abstract get_criterion(stage: str) → torch.nn.modules.module.Module[source]¶
- Returns the criterion for a given stage. - Example: - # for typical classification task >>> experiment.get_criterion(stage="training") nn.CrossEntropyLoss() - Parameters
- stage – stage name of interest like “pretrain” / “train” / “finetune” / etc 
 - Returns: # noqa: DAR202
- Criterion: criterion for a given stage. 
 
 - 
get_datasets(stage: str, epoch: int = None, **kwargs) → OrderedDict[str, Dataset][source]¶
- Returns the datasets for a given stage and epoch. # noqa: DAR401 - Note - For Deep Learning cases you have the same dataset during whole stage. - For Reinforcement Learning it common to change the dataset (experiment) every training epoch. - Parameters
- stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc 
- epoch – epoch index 
- **kwargs – additional parameters to use during dataset creation 
 
 - Returns: # noqa: DAR202
- OrderedDict[str, Dataset]: Ordered dictionary
- with datasets for current stage and epoch. 
 
 - Note - We need ordered dictionary to guarantee the correct dataflow and order of our training datasets. For example, to run through train data before validation one :) - Example: - >>> experiment.get_datasets( >>> stage="training", >>> in_csv_train="path/to/train/csv", >>> in_csv_valid="path/to/valid/csv", >>> ) OrderedDict({ "train": CsvDataset(in_csv=in_csv_train, ...), "valid": CsvDataset(in_csv=in_csv_valid, ...), }) 
 - 
abstract get_loaders(stage: str, epoch: int = None) → OrderedDict[str, DataLoader][source]¶
- Returns the loaders for a given stage. # noqa: DAR401 - Note - Wrapper for - catalyst.core.experiment.IExperiment.get_datasets. For most of your experiments you need to rewrite get_datasets method only.- Parameters
- stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc 
- epoch – epoch index 
 
 - Returns: # noqa: DAR202
- OrderedDict[str, DataLoader]: Ordered dictionary
- with loaders for current stage and epoch. 
 
 
 - 
abstract get_model(stage: str) → torch.nn.modules.module.Module[source]¶
- Returns the model for a given stage. - Example: - # suppose we have typical MNIST model, like # nn.Sequential(nn.Linear(28*28, 128), nn.Linear(128, 10)) >>> experiment.get_model(stage="training") Sequential( : Linear(in_features=784, out_features=128, bias=True) : Linear(in_features=128, out_features=10, bias=True) ) - Parameters
- stage – stage name of interest like “pretrain” / “train” / “finetune” / etc 
 - Returns: # noqa: DAR202
- Model: model for a given stage. 
 
 - 
abstract get_optimizer(stage: str, model: torch.nn.modules.module.Module) → torch.optim.optimizer.Optimizer[source]¶
- Returns the optimizer for a given stage and model. - Example: - >>> experiment.get_optimizer(stage="training", model=model) torch.optim.Adam(model.parameters()) - Parameters
- stage – stage name of interest like “pretrain” / “train” / “finetune” / etc 
- model – model to optimize with stage optimizer 
 
 - Returns: # noqa: DAR202
- Optimizer: optimizer for a given stage and model. 
 
 - 
abstract get_scheduler(stage: str, optimizer: torch.optim.optimizer.Optimizer) → torch.optim.lr_scheduler._LRScheduler[source]¶
- Returns the scheduler for a given stage and optimizer. - Example::
- >>> experiment.get_scheduler(stage="training", optimizer=optimizer) torch.optim.lr_scheduler.StepLR(optimizer) 
 - Parameters
- stage – stage name of interest like “pretrain” / “train” / “finetune” / etc 
- optimizer – optimizer to schedule with stage scheduler 
 
 - Returns: # noqa: DAR202
- Scheduler: scheduler for a given stage and optimizer. 
 
 - 
abstract get_stage_params(stage: str) → Mapping[str, Any][source]¶
- Returns extra stage parameters for a given stage. - Example: - >>> experiment.get_stage_params(stage="training") { "logdir": "./logs/training", "num_epochs": 42, "valid_loader": "valid", "main_metric": "loss", "minimize_metric": True, "checkpoint_data": { "comment": "break the cycle - use the Catalyst" } } - Parameters
- stage – stage name of interest like “pretrain” / “train” / “finetune” / etc 
 - Returns: # noqa: DAR202
- dict: parameters for a given stage. 
 
 - 
get_transforms(stage: str = None, dataset: str = None)[source]¶
- Returns the data transforms for a given stage and dataset. - # noqa: DAR401, W505 - Parameters
- stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc 
- dataset – dataset name of interest, like “train” / “valid” / “infer” 
 
 - Note - For datasets/loaders nameing please follow - catalyst.core.runnerdocumentation.- Returns: # noqa: DAR202
- Data transformations to use for specified dataset. 
 
 - 
abstract property hparams¶
- Returns hyper-parameters for current experiment. - Example::
- >>> experiment.hparams OrderedDict([('optimizer', 'Adam'), ('lr', 0.02), ('betas', (0.9, 0.999)), ('eps', 1e-08), ('weight_decay', 0), ('amsgrad', False), ('train_batch_size', 32)]) 
 
 - 
abstract property initial_seed¶
- Experiment’s initial seed, used to setup global seed at the beginning of each stage. Additionally, Catalyst Runner setups experiment.initial_seed + runner.global_epoch + 1 as global seed each epoch. Used for experiment reproducibility. - Example: - >>> experiment.initial_seed 42 
 - 
abstract property logdir¶
- Path to the directory where the experiment logs would be saved. - Example: - >>> experiment.logdir ./path/to/my/experiment/logs 
 - 
abstract property stages¶
- Experiment’s stage names. - Example: - >>> experiment.stages ["pretraining", "training", "finetuning"] - Note - To understand stages concept, please follow Catalyst documentation, for example, - catalyst.core.callback.Callback
 - 
abstract property trial¶
- Returns hyperparameter trial for current experiment. Could be usefull for Optuna/HyperOpt/Ray.tune hyperparameters optimizers. - Example: - >>> experiment.trial optuna.trial._trial.Trial # Optuna variant 
 
- 
abstract property 
- 
class catalyst.core.experiment.IExperiment[source]
- Bases: - abc.ABC- An abstraction that contains information about the experiment – a model, a criterion, an optimizer, a scheduler, and their hyperparameters. It also contains information about the data and transformations used. In general, the Experiment knows what you would like to run. - Note - To learn more about Catalyst Core concepts, please check out - Abstraction, please check out the implementations: - 
abstract property distributed_params
- Dictionary with the parameters for distributed and half-precision training. - Used in - catalyst.utils.distributed.process_componentsto setup Nvidia Apex or PyTorch distributed.- Example: - >>> experiment.distributed_params {"opt_level": "O1", "syncbn": True} # Apex variant 
 - 
abstract get_callbacks(stage: str) → OrderedDict[str, Callback][source]
- Returns callbacks for a given stage. - Note - To learn more about Catalyst Callbacks mechanism, please follow - catalyst.core.callback.Callbackdocumentation.- Note - We need ordered dictionary to guarantee the correct dataflow and order of metrics optimization. For example, to compute loss before optimization, or to compute all the metrics before logging :) - Parameters
- stage – stage name of interest like “pretrain” / “train” / “finetune” / etc 
 - Returns: # noqa: DAR202
- OrderedDict[str, Callback]: Ordered dictionary # noqa: DAR202 with callbacks for current stage. 
 - Note - To learn more about Catalyst Core concepts, please check out - Parameters
- stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc 
- Returns
- Ordered dictionary
- with callbacks for current stage. 
 
- Return type
- OrderedDict[str, Callback] 
 
 - 
abstract get_criterion(stage: str) → torch.nn.modules.module.Module[source]
- Returns the criterion for a given stage. - Example: - # for typical classification task >>> experiment.get_criterion(stage="training") nn.CrossEntropyLoss() - Parameters
- stage – stage name of interest like “pretrain” / “train” / “finetune” / etc 
 - Returns: # noqa: DAR202
- Criterion: criterion for a given stage. 
 
 - 
get_datasets(stage: str, epoch: int = None, **kwargs) → OrderedDict[str, Dataset][source]
- Returns the datasets for a given stage and epoch. # noqa: DAR401 - Note - For Deep Learning cases you have the same dataset during whole stage. - For Reinforcement Learning it common to change the dataset (experiment) every training epoch. - Parameters
- stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc 
- epoch – epoch index 
- **kwargs – additional parameters to use during dataset creation 
 
 - Returns: # noqa: DAR202
- OrderedDict[str, Dataset]: Ordered dictionary
- with datasets for current stage and epoch. 
 
 - Note - We need ordered dictionary to guarantee the correct dataflow and order of our training datasets. For example, to run through train data before validation one :) - Example: - >>> experiment.get_datasets( >>> stage="training", >>> in_csv_train="path/to/train/csv", >>> in_csv_valid="path/to/valid/csv", >>> ) OrderedDict({ "train": CsvDataset(in_csv=in_csv_train, ...), "valid": CsvDataset(in_csv=in_csv_valid, ...), }) 
 - 
abstract get_loaders(stage: str, epoch: int = None) → OrderedDict[str, DataLoader][source]
- Returns the loaders for a given stage. # noqa: DAR401 - Note - Wrapper for - catalyst.core.experiment.IExperiment.get_datasets. For most of your experiments you need to rewrite get_datasets method only.- Parameters
- stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc 
- epoch – epoch index 
 
 - Returns: # noqa: DAR202
- OrderedDict[str, DataLoader]: Ordered dictionary
- with loaders for current stage and epoch. 
 
 
 - 
abstract get_model(stage: str) → torch.nn.modules.module.Module[source]
- Returns the model for a given stage. - Example: - # suppose we have typical MNIST model, like # nn.Sequential(nn.Linear(28*28, 128), nn.Linear(128, 10)) >>> experiment.get_model(stage="training") Sequential( : Linear(in_features=784, out_features=128, bias=True) : Linear(in_features=128, out_features=10, bias=True) ) - Parameters
- stage – stage name of interest like “pretrain” / “train” / “finetune” / etc 
 - Returns: # noqa: DAR202
- Model: model for a given stage. 
 
 - 
abstract get_optimizer(stage: str, model: torch.nn.modules.module.Module) → torch.optim.optimizer.Optimizer[source]
- Returns the optimizer for a given stage and model. - Example: - >>> experiment.get_optimizer(stage="training", model=model) torch.optim.Adam(model.parameters()) - Parameters
- stage – stage name of interest like “pretrain” / “train” / “finetune” / etc 
- model – model to optimize with stage optimizer 
 
 - Returns: # noqa: DAR202
- Optimizer: optimizer for a given stage and model. 
 
 - 
abstract get_scheduler(stage: str, optimizer: torch.optim.optimizer.Optimizer) → torch.optim.lr_scheduler._LRScheduler[source]
- Returns the scheduler for a given stage and optimizer. - Example::
- >>> experiment.get_scheduler(stage="training", optimizer=optimizer) torch.optim.lr_scheduler.StepLR(optimizer) 
 - Parameters
- stage – stage name of interest like “pretrain” / “train” / “finetune” / etc 
- optimizer – optimizer to schedule with stage scheduler 
 
 - Returns: # noqa: DAR202
- Scheduler: scheduler for a given stage and optimizer. 
 
 - 
abstract get_stage_params(stage: str) → Mapping[str, Any][source]
- Returns extra stage parameters for a given stage. - Example: - >>> experiment.get_stage_params(stage="training") { "logdir": "./logs/training", "num_epochs": 42, "valid_loader": "valid", "main_metric": "loss", "minimize_metric": True, "checkpoint_data": { "comment": "break the cycle - use the Catalyst" } } - Parameters
- stage – stage name of interest like “pretrain” / “train” / “finetune” / etc 
 - Returns: # noqa: DAR202
- dict: parameters for a given stage. 
 
 - 
get_transforms(stage: str = None, dataset: str = None)[source]
- Returns the data transforms for a given stage and dataset. - # noqa: DAR401, W505 - Parameters
- stage – stage name of interest, like “pretrain” / “train” / “finetune” / etc 
- dataset – dataset name of interest, like “train” / “valid” / “infer” 
 
 - Note - For datasets/loaders nameing please follow - catalyst.core.runnerdocumentation.- Returns: # noqa: DAR202
- Data transformations to use for specified dataset. 
 
 - 
abstract property hparams
- Returns hyper-parameters for current experiment. - Example::
- >>> experiment.hparams OrderedDict([('optimizer', 'Adam'), ('lr', 0.02), ('betas', (0.9, 0.999)), ('eps', 1e-08), ('weight_decay', 0), ('amsgrad', False), ('train_batch_size', 32)]) 
 
 - 
abstract property initial_seed
- Experiment’s initial seed, used to setup global seed at the beginning of each stage. Additionally, Catalyst Runner setups experiment.initial_seed + runner.global_epoch + 1 as global seed each epoch. Used for experiment reproducibility. - Example: - >>> experiment.initial_seed 42 
 - 
abstract property logdir
- Path to the directory where the experiment logs would be saved. - Example: - >>> experiment.logdir ./path/to/my/experiment/logs 
 - 
abstract property stages
- Experiment’s stage names. - Example: - >>> experiment.stages ["pretraining", "training", "finetuning"] - Note - To understand stages concept, please follow Catalyst documentation, for example, - catalyst.core.callback.Callback
 - 
abstract property trial
- Returns hyperparameter trial for current experiment. Could be usefull for Optuna/HyperOpt/Ray.tune hyperparameters optimizers. - Example: - >>> experiment.trial optuna.trial._trial.Trial # Optuna variant 
 
- 
abstract property 
Runner¶
- 
class catalyst.core.runner.RunnerException(message: str)[source]¶
- Bases: - Exception- Exception class for all runner errors. 
- 
class catalyst.core.runner.IRunner(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None)[source]¶
- Bases: - abc.ABC,- catalyst.core.callback.ICallback,- catalyst.core.legacy.IRunnerLegacy- An abstraction that knows how to run an experiment. It contains all the logic of how to run the experiment, stages, epoch and batches. - Note - To learn more about Catalyst Core concepts, please check out - Abstraction, please check out the implementations: - Runner also contains full information about experiment runner. - Runner section - runner.model - an instance of torch.nn.Module class, (should implement - forwardmethod); for example,- runner.model = torch.nn.Linear(10, 10) - runner.device - an instance of torch.device (CPU, GPU, TPU); for example, - runner.device = torch.device("cpu") - Experiment section - runner.criterion - an instance of torch.nn.Module class or torch.nn.modules.loss._Loss (should implement - forwardmethod); for example,- runner.criterion = torch.nn.CrossEntropyLoss() - runner.optimizer - an instance of torch.optim.optimizer.Optimizer (should implement - stepmethod); for example,- runner.optimizer = torch.optim.Adam() - runner.scheduler - an instance of torch.optim.lr_scheduler._LRScheduler (should implement - stepmethod); for example,- runner.scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau() - runner.callbacks - ordered dictionary with Catalyst.Callback instances; for example, - runner.callbacks = { "accuracy": AccuracyCallback(), "criterion": CriterionCallback(), "optim": OptimizerCallback(), "saver": CheckpointCallback() } - Dataflow section - runner.loaders - ordered dictionary with torch.DataLoaders; for example, - runner.loaders = { "train": MnistTrainLoader(), "valid": MnistValidLoader() } - Note - “train” prefix is used for training loaders - metrics computations, backward pass, optimization 
- “valid” prefix is used for validation loaders - metrics computations only 
- “infer” prefix is used for inference loaders - dataset prediction 
 - runner.input - dictionary, containing batch of data from currents DataLoader; for example, - runner.input = { "images": np.ndarray(batch_size, c, h, w), "targets": np.ndarray(batch_size, 1), } - runner.output - dictionary, containing model output for current batch; for example, - runner.output = {"logits": torch.Tensor(batch_size, num_classes)} - Metrics section - runner.batch_metrics - dictionary, flatten storage for batch metrics; for example, - runner.batch_metrics = {"loss": ..., "accuracy": ..., "iou": ...} - runner.loader_metrics - dictionary with aggregated batch statistics for loader (mean over all batches) and global loader metrics, like AUC; for example, - runner.loader_metrics = {"loss": ..., "accuracy": ..., "auc": ...} - runner.epoch_metrics - dictionary with summarized metrics for different loaders and global epoch metrics, like lr, momentum; for example, - runner.epoch_metrics = { "train_loss": ..., "train_auc": ..., "valid_loss": ..., "lr": ..., "momentum": ..., } - Validation metrics section - runner.main_metric - string, containing name of metric of interest for optimization, validation and checkpointing during training - runner.minimize_metric - bool, indicator flag - Trueif we need to minimize metric during training, like Cross Entropy loss
- Falseif we need to maximize metric during training, like Accuracy or Intersection over Union
 - Validation section - runner.valid_loader - string, name of validation loader for metric selection, validation and model checkpoining - runner.valid_metrics - dictionary with validation metrics for currect epoch; for example, - runner.valid_metrics = {"loss": ..., "accuracy": ..., "auc": ...} - Note - subdictionary of epoch_metrics - runner.is_best_valid - bool, indicator flag - Trueif this training epoch is best over all epochs
- Falseif not
 - runner.best_valid_metrics - dictionary with best validation metrics during whole training process - Distributed section - runner.distributed_rank - distributed rank of current worker - runner.is_distributed_master - bool, indicator flag - Trueif is master node (runner.distributed_rank == 0)
- Falseif is worker node (runner.distributed_rank != 0)
 - runner.is_distributed_worker - bool, indicator flag - Trueif is worker node (runner.distributed_rank > 0)
- Falseif is master node (runner.distributed_rank <= 0)
 - Experiment info section - runner.global_sample_step - int, numerical indicator, counter for all individual samples, that passes through our model during training, validation and inference stages - runner.global_batch_step - int, numerical indicator, counter for all batches, that passes through our model during training, validation and inference stages - runner.global_epoch - int, numerical indicator, counter for all epochs, that have passed during model training, validation and inference stages - runner.verbose - bool, indicator flag - runner.is_check_run - bool, indicator flag - Trueif you want to check you pipeline and run only 2 batches per loader and 2 epochs per stage
- False(default) if you want to just the pipeline
 - runner.need_early_stop - bool, indicator flag used for EarlyStopping and CheckRun Callbacks - Trueif we need to stop the training
- False(default) otherwise
 - runner.need_exception_reraise - bool, indicator flag - True(default) if you want to show exception during pipeline and stop the training process
- Falseotherwise
 - Stage info section - runner.stage - string, current stage name, for example, - runner.stage = "pretraining" / "training" / "finetuning" / etc - runner.num_epochs - int, maximum number of epochs, required for this stage - runner.is_infer_stage - bool, indicator flag - Truefor inference stages
- Falseotherwise
 - Epoch info section - runner.epoch - int, numerical indicator for current stage epoch - Loader info section - runner.loader_sample_step - int, numerical indicator for number of samples passed through our model in current loader - runner.loader_batch_step - int, numerical indicator for batch index in current loader - runner.loader_name - string, current loader name for example, - runner.loader_name = "train_dataset1" / "valid_data2" / "infer_golden" - runner.loader_len - int, maximum number of batches in current loader - runner.loader_batch_size - int, batch size parameter in current loader - runner.is_train_loader - bool, indicator flag - Truefor training loaders
- Falseotherwise
 - runner.is_valid_loader - bool, indicator flag - Truefor validation loaders
- Falseotherwise
 - runner.is_infer_loader - bool, indicator flag - Truefor inference loaders
- Falseotherwise
 - Batch info section - runner.batch_size - int, length of the current batch - Logging section - runner.logdir - string, path to logging directory to save all logs, metrics, checkpoints and artifacts - runner.checkpoint_data - dictionary with all extra data for experiment tracking - Extra section - runner.exception - python Exception instance to raise (or not ;) ) - 
__init__(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None)[source]¶
- Parameters
- model – Torch model object 
- device – Torch device 
 
 
 - 
property device¶
- Returns the runner’s device instance. 
 - 
property model¶
- Returns the runner’s model instance. 
 - 
on_batch_end(runner: catalyst.core.runner.IRunner)[source]¶
- Event handler for batch end. - Parameters
- runner – IRunner instance. 
 
 - 
on_batch_start(runner: catalyst.core.runner.IRunner)[source]¶
- Event handler for batch start. - Parameters
- runner – IRunner instance. 
 
 - 
on_epoch_end(runner: catalyst.core.runner.IRunner)[source]¶
- Event handler for epoch end. - Parameters
- runner – IRunner instance. 
 
 - 
on_epoch_start(runner: catalyst.core.runner.IRunner)[source]¶
- Event handler for epoch start. - Parameters
- runner – IRunner instance. 
- Raises
- RunnerException – if current DataLoader is empty. 
 
 - 
on_exception(runner: catalyst.core.runner.IRunner)[source]¶
- Event handler for exception case. - Parameters
- runner – IRunner instance. 
- Raises
- exception – if during pipeline exception, no handler we found into callbacks 
 
 - 
on_experiment_end(runner: catalyst.core.runner.IRunner)[source]¶
- Event handler for experiment end. - Parameters
- runner – IRunner instance. 
 - Note - This event work only on IRunner. 
 - 
on_experiment_start(runner: catalyst.core.runner.IRunner)[source]¶
- Event handler for experiment start. - Parameters
- runner – IRunner instance. 
 - Note - This event work only on IRunner. 
 - 
on_loader_end(runner: catalyst.core.runner.IRunner)[source]¶
- Event handler for loader end. - Parameters
- runner – IRunner instance. 
 
 - 
on_loader_start(runner: catalyst.core.runner.IRunner)[source]¶
- Event handler for loader start. - Parameters
- runner – IRunner instance. 
- Raises
- RunnerException – if current DataLoader is empty. 
 
 - 
on_stage_end(runner: catalyst.core.runner.IRunner)[source]¶
- Event handler for stage end. - Parameters
- runner – IRunner instance. 
 
 
- 
class catalyst.core.runner.IStageBasedRunner(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None)[source]¶
- Bases: - catalyst.core.runner.IRunner- Runner abstraction that suppose to have constant datasources per stage. - 
on_stage_start(runner: catalyst.core.runner.IRunner) → None[source]¶
- Event handler for stage start. - For the IStageBasedRunner case: - prepares loaders - our datasources 
- prepares model components - model, criterion, optimizer, scheduler 
- prepares callbacks for the current stage 
 - Parameters
- runner – IRunner instance. 
 
 
- 
- 
class catalyst.core.runner.IRunner(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None)[source]
- Bases: - abc.ABC,- catalyst.core.callback.ICallback,- catalyst.core.legacy.IRunnerLegacy- An abstraction that knows how to run an experiment. It contains all the logic of how to run the experiment, stages, epoch and batches. - Note - To learn more about Catalyst Core concepts, please check out - Abstraction, please check out the implementations: - Runner also contains full information about experiment runner. - Runner section - runner.model - an instance of torch.nn.Module class, (should implement - forwardmethod); for example,- runner.model = torch.nn.Linear(10, 10) - runner.device - an instance of torch.device (CPU, GPU, TPU); for example, - runner.device = torch.device("cpu") - Experiment section - runner.criterion - an instance of torch.nn.Module class or torch.nn.modules.loss._Loss (should implement - forwardmethod); for example,- runner.criterion = torch.nn.CrossEntropyLoss() - runner.optimizer - an instance of torch.optim.optimizer.Optimizer (should implement - stepmethod); for example,- runner.optimizer = torch.optim.Adam() - runner.scheduler - an instance of torch.optim.lr_scheduler._LRScheduler (should implement - stepmethod); for example,- runner.scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau() - runner.callbacks - ordered dictionary with Catalyst.Callback instances; for example, - runner.callbacks = { "accuracy": AccuracyCallback(), "criterion": CriterionCallback(), "optim": OptimizerCallback(), "saver": CheckpointCallback() } - Dataflow section - runner.loaders - ordered dictionary with torch.DataLoaders; for example, - runner.loaders = { "train": MnistTrainLoader(), "valid": MnistValidLoader() } - Note - “train” prefix is used for training loaders - metrics computations, backward pass, optimization 
- “valid” prefix is used for validation loaders - metrics computations only 
- “infer” prefix is used for inference loaders - dataset prediction 
 - runner.input - dictionary, containing batch of data from currents DataLoader; for example, - runner.input = { "images": np.ndarray(batch_size, c, h, w), "targets": np.ndarray(batch_size, 1), } - runner.output - dictionary, containing model output for current batch; for example, - runner.output = {"logits": torch.Tensor(batch_size, num_classes)} - Metrics section - runner.batch_metrics - dictionary, flatten storage for batch metrics; for example, - runner.batch_metrics = {"loss": ..., "accuracy": ..., "iou": ...} - runner.loader_metrics - dictionary with aggregated batch statistics for loader (mean over all batches) and global loader metrics, like AUC; for example, - runner.loader_metrics = {"loss": ..., "accuracy": ..., "auc": ...} - runner.epoch_metrics - dictionary with summarized metrics for different loaders and global epoch metrics, like lr, momentum; for example, - runner.epoch_metrics = { "train_loss": ..., "train_auc": ..., "valid_loss": ..., "lr": ..., "momentum": ..., } - Validation metrics section - runner.main_metric - string, containing name of metric of interest for optimization, validation and checkpointing during training - runner.minimize_metric - bool, indicator flag - Trueif we need to minimize metric during training, like Cross Entropy loss
- Falseif we need to maximize metric during training, like Accuracy or Intersection over Union
 - Validation section - runner.valid_loader - string, name of validation loader for metric selection, validation and model checkpoining - runner.valid_metrics - dictionary with validation metrics for currect epoch; for example, - runner.valid_metrics = {"loss": ..., "accuracy": ..., "auc": ...} - Note - subdictionary of epoch_metrics - runner.is_best_valid - bool, indicator flag - Trueif this training epoch is best over all epochs
- Falseif not
 - runner.best_valid_metrics - dictionary with best validation metrics during whole training process - Distributed section - runner.distributed_rank - distributed rank of current worker - runner.is_distributed_master - bool, indicator flag - Trueif is master node (runner.distributed_rank == 0)
- Falseif is worker node (runner.distributed_rank != 0)
 - runner.is_distributed_worker - bool, indicator flag - Trueif is worker node (runner.distributed_rank > 0)
- Falseif is master node (runner.distributed_rank <= 0)
 - Experiment info section - runner.global_sample_step - int, numerical indicator, counter for all individual samples, that passes through our model during training, validation and inference stages - runner.global_batch_step - int, numerical indicator, counter for all batches, that passes through our model during training, validation and inference stages - runner.global_epoch - int, numerical indicator, counter for all epochs, that have passed during model training, validation and inference stages - runner.verbose - bool, indicator flag - runner.is_check_run - bool, indicator flag - Trueif you want to check you pipeline and run only 2 batches per loader and 2 epochs per stage
- False(default) if you want to just the pipeline
 - runner.need_early_stop - bool, indicator flag used for EarlyStopping and CheckRun Callbacks - Trueif we need to stop the training
- False(default) otherwise
 - runner.need_exception_reraise - bool, indicator flag - True(default) if you want to show exception during pipeline and stop the training process
- Falseotherwise
 - Stage info section - runner.stage - string, current stage name, for example, - runner.stage = "pretraining" / "training" / "finetuning" / etc - runner.num_epochs - int, maximum number of epochs, required for this stage - runner.is_infer_stage - bool, indicator flag - Truefor inference stages
- Falseotherwise
 - Epoch info section - runner.epoch - int, numerical indicator for current stage epoch - Loader info section - runner.loader_sample_step - int, numerical indicator for number of samples passed through our model in current loader - runner.loader_batch_step - int, numerical indicator for batch index in current loader - runner.loader_name - string, current loader name for example, - runner.loader_name = "train_dataset1" / "valid_data2" / "infer_golden" - runner.loader_len - int, maximum number of batches in current loader - runner.loader_batch_size - int, batch size parameter in current loader - runner.is_train_loader - bool, indicator flag - Truefor training loaders
- Falseotherwise
 - runner.is_valid_loader - bool, indicator flag - Truefor validation loaders
- Falseotherwise
 - runner.is_infer_loader - bool, indicator flag - Truefor inference loaders
- Falseotherwise
 - Batch info section - runner.batch_size - int, length of the current batch - Logging section - runner.logdir - string, path to logging directory to save all logs, metrics, checkpoints and artifacts - runner.checkpoint_data - dictionary with all extra data for experiment tracking - Extra section - runner.exception - python Exception instance to raise (or not ;) ) - 
__init__(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None)[source]
- Parameters
- model – Torch model object 
- device – Torch device 
 
 
 - 
property device
- Returns the runner’s device instance. 
 - 
property model
- Returns the runner’s model instance. 
 - 
on_batch_end(runner: catalyst.core.runner.IRunner)[source]
- Event handler for batch end. - Parameters
- runner – IRunner instance. 
 
 - 
on_batch_start(runner: catalyst.core.runner.IRunner)[source]
- Event handler for batch start. - Parameters
- runner – IRunner instance. 
 
 - 
on_epoch_end(runner: catalyst.core.runner.IRunner)[source]
- Event handler for epoch end. - Parameters
- runner – IRunner instance. 
 
 - 
on_epoch_start(runner: catalyst.core.runner.IRunner)[source]
- Event handler for epoch start. - Parameters
- runner – IRunner instance. 
- Raises
- RunnerException – if current DataLoader is empty. 
 
 - 
on_exception(runner: catalyst.core.runner.IRunner)[source]
- Event handler for exception case. - Parameters
- runner – IRunner instance. 
- Raises
- exception – if during pipeline exception, no handler we found into callbacks 
 
 - 
on_experiment_end(runner: catalyst.core.runner.IRunner)[source]
- Event handler for experiment end. - Parameters
- runner – IRunner instance. 
 - Note - This event work only on IRunner. 
 - 
on_experiment_start(runner: catalyst.core.runner.IRunner)[source]
- Event handler for experiment start. - Parameters
- runner – IRunner instance. 
 - Note - This event work only on IRunner. 
 - 
on_loader_end(runner: catalyst.core.runner.IRunner)[source]
- Event handler for loader end. - Parameters
- runner – IRunner instance. 
 
 - 
on_loader_start(runner: catalyst.core.runner.IRunner)[source]
- Event handler for loader start. - Parameters
- runner – IRunner instance. 
- Raises
- RunnerException – if current DataLoader is empty. 
 
 - 
on_stage_end(runner: catalyst.core.runner.IRunner)[source]
- Event handler for stage end. - Parameters
- runner – IRunner instance. 
 
 - 
on_stage_start(runner: catalyst.core.runner.IRunner)[source]
- Event handler for stage start. - Parameters
- runner – IRunner instance. 
 
 - 
run_experiment(experiment: catalyst.core.experiment.IExperiment = None) → catalyst.core.runner.IRunner[source]
- Starts the experiment. - Parameters
- experiment – Experiment instance to use for Runner. 
- Returns
- self, IRunner instance after the experiment 
 
 
- 
class catalyst.core.runner.IStageBasedRunner(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None)[source]
- Bases: - catalyst.core.runner.IRunner- Runner abstraction that suppose to have constant datasources per stage. - 
on_stage_start(runner: catalyst.core.runner.IRunner) → None[source]
- Event handler for stage start. - For the IStageBasedRunner case: - prepares loaders - our datasources 
- prepares model components - model, criterion, optimizer, scheduler 
- prepares callbacks for the current stage 
 - Parameters
- runner – IRunner instance. 
 
 
- 
- 
exception catalyst.core.runner.RunnerException(message: str)[source]
- Bases: - Exception- Exception class for all runner errors. - 
__init__(message: str)[source]
- Parameters
- message – exception message 
 
 
- 
RunnerLegacy¶
- 
class catalyst.core.legacy.IRunnerLegacy[source]¶
- Bases: - object- Special class to encapsulate all catalyst.core.runner.IRunner and catalyst.core.runner.State legacy into one place. Used to make catalyst.core.runner.IRunner cleaner and easier to understand. - Saved for backward compatibility. Should be removed someday. - 
property batch_in¶
- Alias for runner.input. - Warning - Deprecated, saved for backward compatibility. Please use runner.input instead. 
 - 
property batch_out¶
- Alias for runner.output. - Warning - Deprecated, saved for backward compatibility. Please use runner.output instead. 
 - 
property loader_name¶
- Alias for runner.loader_key. - Warning - Deprecated, saved for backward compatibility. Please use runner.loader_key instead. 
 - 
property loader_step¶
- Alias for runner.loader_batch_step. - Warning - Deprecated, saved for backward compatibility. Please use runner.loader_batch_step instead. 
 - 
property need_backward_pass¶
- Alias for runner.is_train_loader. - Warning - Deprecated, saved for backward compatibility. Please use runner.is_train_loader instead. 
 - 
property stage_name¶
- Alias for runner.stage. - Warning - Deprecated, saved for backward compatibility. Please use runner.stage instead. 
 - 
property state¶
- Alias for runner. - Warning - Deprecated, saved for backward compatibility. Please use runner instead. 
 
- 
property 
Callback¶
- 
class catalyst.core.callback.CallbackNode[source]¶
- Bases: - enum.IntFlag- Callback node usage flag during distributed training. - All (0) - use on all nodes, botch master and worker. 
- Master (1) - use only on master node. 
- Worker (2) - use only in worker nodes. 
 - 
All= 0¶
 - 
Master= 1¶
 - 
Worker= 2¶
 - 
all= 0¶
 - 
master= 1¶
 - 
worker= 2¶
 
- 
class catalyst.core.callback.CallbackOrder[source]¶
- Bases: - enum.IntFlag- Callback usage order during training. - Catalyst executes Callbacks with low CallbackOrder before Callbacks with high CallbackOrder. - Predefined orders: - Internal (0) - some Catalyst Extras, like PhaseCallbacks (used in GANs). 
- Metric (20) - Callbacks with metrics and losses computation. 
- MetricAggregation (40) - metrics aggregation callbacks, like sum different losses into one. 
- Optimizer (60) - optimizer step, requires computed metrics for optimization. 
- Validation (80) - validation step, computes validation metrics subset based on all metrics. 
- Scheduler (100) - scheduler step, in ReduceLROnPlateau case requires computed validation metrics for optimizer schedule. 
- Logging (120) - logging step, logs metrics to Console/Tensorboard/Alchemy, requires computed metrics. 
- External (200) - additional callbacks with custom logic, like InferenceCallbacks 
 - Nevertheless, you always can create CustomCallback with any order, for example: - >>> class MyCustomCallback(Callback): >>> def __init__(self): >>> super().__init__(order=42) >>> ... # MyCustomCallback will be executed after all `Metric`-Callbacks # but before all `MetricAggregation`-Callbacks. - 
External= 200¶
 - 
Internal= 0¶
 - 
Logging= 120¶
 - 
Metric= 20¶
 - 
MetricAggregation= 40¶
 - 
Optimizer= 60¶
 - 
Scheduler= 100¶
 - 
Validation= 80¶
 - 
external= 200¶
 - 
internal= 0¶
 - 
logging= 120¶
 - 
metric= 20¶
 - 
metric_aggregation= 40¶
 - 
optimizer= 60¶
 - 
scheduler= 100¶
 - 
validation= 80¶
 
- 
class catalyst.core.callback.CallbackScope[source]¶
- Bases: - enum.IntFlag- Callback scope usage flag during training. - Stage (0) - use Callback only during one experiment stage. 
- Experiment (1) - use Callback during whole experiment run. 
 - 
Experiment= 1¶
 - 
Stage= 0¶
 - 
experiment= 1¶
 - 
stage= 0¶
 
- 
class catalyst.core.callback.Callback(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]¶
- Bases: - catalyst.core.callback.ICallback- An abstraction that lets you customize your experiment run logic. To give users maximum flexibility and extensibility Catalyst supports callback execution anywhere in the training loop: - -- stage start ---- epoch start ------ loader start -------- batch start ---------- batch handler (Runner logic) -------- batch end ------ loader end ---- epoch end -- stage end exception – if an Exception was raised - All callbacks have
- orderfrom- CallbackOrder
- nodefrom- CallbackNode
- scopefrom- CallbackScope
 
 - Note - To learn more about Catalyst Core concepts, please check out - Abstraction, please check out the implementations: 
- 
class catalyst.core.callback.CallbackWrapper(base_callback: catalyst.core.callback.Callback, enable_callback: bool = True)[source]¶
- Bases: - catalyst.core.callback.Callback- Enable/disable callback execution. - 
__init__(base_callback: catalyst.core.callback.Callback, enable_callback: bool = True)[source]¶
- Parameters
- base_callback – callback to wrap 
- enable_callback – indicator to enable/disable callback, if - Truethen callback will be enabled, default- True
 
 
 - 
on_batch_end(runner: IRunner) → None[source]¶
- Run base_callback (if possible) - Parameters
- runner – current runner 
 
 - 
on_batch_start(runner: IRunner) → None[source]¶
- Run base_callback (if possible) - Parameters
- runner – current runner 
 
 - 
on_epoch_end(runner: IRunner) → None[source]¶
- Run base_callback (if possible) - Parameters
- runner – current runner 
 
 - 
on_epoch_start(runner: IRunner) → None[source]¶
- Run base_callback (if possible) - Parameters
- runner – current runner 
 
 - 
on_exception(runner: IRunner) → None[source]¶
- Run base_callback (if possible) - Parameters
- runner – current runner 
 
 - 
on_loader_end(runner: IRunner) → None[source]¶
- Reset status of callback - Parameters
- runner – current runner 
 
 - 
on_loader_start(runner: IRunner) → None[source]¶
- Check if current epoch should be skipped. - Parameters
- runner – current runner 
 
 
- 
- 
class catalyst.core.callback.Callback(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]
- Bases: - catalyst.core.callback.ICallback- An abstraction that lets you customize your experiment run logic. To give users maximum flexibility and extensibility Catalyst supports callback execution anywhere in the training loop: - -- stage start ---- epoch start ------ loader start -------- batch start ---------- batch handler (Runner logic) -------- batch end ------ loader end ---- epoch end -- stage end exception – if an Exception was raised - All callbacks have
- orderfrom- CallbackOrder
- nodefrom- CallbackNode
- scopefrom- CallbackScope
 
 - Note - To learn more about Catalyst Core concepts, please check out - Abstraction, please check out the implementations: - 
__init__(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]
- Callback initializer. - Parameters
- order – flag from - CallbackOrder
- node – flag from - CallbackNode
- scope – flag from - CallbackScope
 
 
 
- 
class catalyst.core.callback.CallbackNode[source]
- Bases: - enum.IntFlag- Callback node usage flag during distributed training. - All (0) - use on all nodes, botch master and worker. 
- Master (1) - use only on master node. 
- Worker (2) - use only in worker nodes. 
 - 
All= 0
 - 
Master= 1
 - 
Worker= 2
 - 
all= 0
 - 
master= 1
 - 
worker= 2
 
- 
class catalyst.core.callback.CallbackOrder[source]
- Bases: - enum.IntFlag- Callback usage order during training. - Catalyst executes Callbacks with low CallbackOrder before Callbacks with high CallbackOrder. - Predefined orders: - Internal (0) - some Catalyst Extras, like PhaseCallbacks (used in GANs). 
- Metric (20) - Callbacks with metrics and losses computation. 
- MetricAggregation (40) - metrics aggregation callbacks, like sum different losses into one. 
- Optimizer (60) - optimizer step, requires computed metrics for optimization. 
- Validation (80) - validation step, computes validation metrics subset based on all metrics. 
- Scheduler (100) - scheduler step, in ReduceLROnPlateau case requires computed validation metrics for optimizer schedule. 
- Logging (120) - logging step, logs metrics to Console/Tensorboard/Alchemy, requires computed metrics. 
- External (200) - additional callbacks with custom logic, like InferenceCallbacks 
 - Nevertheless, you always can create CustomCallback with any order, for example: - >>> class MyCustomCallback(Callback): >>> def __init__(self): >>> super().__init__(order=42) >>> ... # MyCustomCallback will be executed after all `Metric`-Callbacks # but before all `MetricAggregation`-Callbacks. - 
External= 200
 - 
Internal= 0
 - 
Logging= 120
 - 
Metric= 20
 - 
MetricAggregation= 40
 - 
Optimizer= 60
 - 
Scheduler= 100
 - 
Validation= 80
 - 
external= 200
 - 
internal= 0
 - 
logging= 120
 - 
metric= 20
 - 
metric_aggregation= 40
 - 
optimizer= 60
 - 
scheduler= 100
 - 
validation= 80
 
- 
class catalyst.core.callback.CallbackScope[source]
- Bases: - enum.IntFlag- Callback scope usage flag during training. - Stage (0) - use Callback only during one experiment stage. 
- Experiment (1) - use Callback during whole experiment run. 
 - 
Experiment= 1
 - 
Stage= 0
 - 
experiment= 1
 - 
stage= 0
 
- 
class catalyst.core.callback.CallbackWrapper(base_callback: catalyst.core.callback.Callback, enable_callback: bool = True)[source]
- Bases: - catalyst.core.callback.Callback- Enable/disable callback execution. - 
__init__(base_callback: catalyst.core.callback.Callback, enable_callback: bool = True)[source]
- Parameters
- base_callback – callback to wrap 
- enable_callback – indicator to enable/disable callback, if - Truethen callback will be enabled, default- True
 
 
 - 
on_batch_end(runner: IRunner) → None[source]
- Run base_callback (if possible) - Parameters
- runner – current runner 
 
 - 
on_batch_start(runner: IRunner) → None[source]
- Run base_callback (if possible) - Parameters
- runner – current runner 
 
 - 
on_epoch_end(runner: IRunner) → None[source]
- Run base_callback (if possible) - Parameters
- runner – current runner 
 
 - 
on_epoch_start(runner: IRunner) → None[source]
- Run base_callback (if possible) - Parameters
- runner – current runner 
 
 - 
on_exception(runner: IRunner) → None[source]
- Run base_callback (if possible) - Parameters
- runner – current runner 
 
 - 
on_loader_end(runner: IRunner) → None[source]
- Reset status of callback - Parameters
- runner – current runner 
 
 - 
on_loader_start(runner: IRunner) → None[source]
- Check if current epoch should be skipped. - Parameters
- runner – current runner 
 
 - 
on_stage_end(runner: IRunner) → None[source]
- Run base_callback (if possible) - Parameters
- runner – current runner 
 
 - 
on_stage_start(runner: IRunner) → None[source]
- Run base_callback (if possible) - Parameters
- runner – current runner 
 
 
-