Core¶
Core¶
Experiment¶
-
class
catalyst.core.experiment.
IExperiment
[source]¶ Bases:
abc.ABC
An abstraction that contains information about the experiment – a model, a criterion, an optimizer, a scheduler, and their hyperparameters. It also contains information about the data and transformations used. In general, the Experiment knows what you would like to run.
Note
To learn more about Catalyst Core concepts, please check out
Abstraction, please check out the implementations:
catalyst.dl.experiment.base.BaseExperiment
-
abstract property
distributed_params
¶ Dictionary with the parameters for distributed and half-precision training.
Used in
catalyst.utils.distributed.process_components
to setup Nvidia Apex or PyTorch distributed.Example:
>>> experiment.distributed_params {"opt_level": "O1", "syncbn": True} # Apex variant
-
abstract
get_callbacks
(stage: str) → OrderedDict[str, Callback][source]¶ Returns callbacks for a given stage.
Note
To learn more about Catalyst Callbacks mechanism, please follow
catalyst.core.callback.Callback
documentation.Note
We need ordered dictionary to guarantee the correct dataflow and order of metrics optimization. For example, to compute loss before optimization, or to compute all the metrics before logging :)
- Parameters
stage (str) – stage name of interest like “pretrain” / “train” / “finetune” / etc
- Returns: # noqa: DAR202
OrderedDict[str, Callback]: Ordered dictionary # noqa: DAR202 with callbacks for current stage.
Note
To learn more about Catalyst Core concepts, please check out
- Parameters
stage (str) – stage name of interest, like “pretrain” / “train” / “finetune” / etc
- Returns
- Ordered dictionary
with callbacks for current stage.
- Return type
OrderedDict[str, Callback]
-
abstract
get_criterion
(stage: str) → torch.nn.modules.module.Module[source]¶ Returns the criterion for a given stage.
Example:
# for typical classification task >>> experiment.get_criterion(stage="training") nn.CrossEntropyLoss()
- Parameters
stage (str) – stage name of interest like “pretrain” / “train” / “finetune” / etc
- Returns: # noqa: DAR202
Criterion: criterion for a given stage.
-
get_datasets
(stage: str, epoch: int = None, **kwargs) → OrderedDict[str, Dataset][source]¶ Returns the datasets for a given stage and epoch. # noqa: DAR401
Note
For Deep Learning cases you have the same dataset during whole stage.
For Reinforcement Learning it common to change the dataset (experiment) every training epoch.
- Parameters
stage (str) – stage name of interest, like “pretrain” / “train” / “finetune” / etc
epoch (int) – epoch index
**kwargs (dict) – additional parameters to use during dataset creation
- Returns: # noqa: DAR202
- OrderedDict[str, Dataset]: Ordered dictionary
with datasets for current stage and epoch.
Note
We need ordered dictionary to guarantee the correct dataflow and order of our training datasets. For example, to run through train data before validation one :)
Example:
>>> experiment.get_datasets( >>> stage="training", >>> in_csv_train="path/to/train/csv", >>> in_csv_valid="path/to/valid/csv", >>> ) OrderedDict({ "train": CsvDataset(in_csv=in_csv_train, ...), "valid": CsvDataset(in_csv=in_csv_valid, ...), })
-
get_experiment_components
(stage: str, model: torch.nn.modules.module.Module = None) → Tuple[torch.nn.modules.module.Module, torch.nn.modules.module.Module, torch.optim.optimizer.Optimizer, torch.optim.lr_scheduler._LRScheduler][source]¶ Returns the tuple containing criterion, optimizer and scheduler by giving model and stage.
Aggregation method, based on,
- Parameters
stage (str) – stage name of interest, like “pretrain” / “train” / “finetune” / etc
model (Model) – model to optimize with stage optimizer
- Returns
- model, criterion, optimizer, scheduler
for a given stage and model
- Return type
tuple
-
abstract
get_loaders
(stage: str, epoch: int = None) → OrderedDict[str, DataLoader][source]¶ Returns the loaders for a given stage. # noqa: DAR401
Note
Wrapper for
catalyst.core.experiment.IExperiment.get_datasets
. For most of your experiments you need to rewrite get_datasets method only.- Parameters
stage (str) – stage name of interest, like “pretrain” / “train” / “finetune” / etc
epoch (int) – epoch index
- Returns: # noqa: DAR202
- OrderedDict[str, DataLoader]: Ordered dictionary
with loaders for current stage and epoch.
-
abstract
get_model
(stage: str) → torch.nn.modules.module.Module[source]¶ Returns the model for a given stage.
Example:
# suppose we have typical MNIST model, like # nn.Sequential(nn.Linear(28*28, 128), nn.Linear(128, 10)) >>> experiment.get_model(stage="training") Sequential( (0): Linear(in_features=784, out_features=128, bias=True) (1): Linear(in_features=128, out_features=10, bias=True) )
- Parameters
stage (str) – stage name of interest like “pretrain” / “train” / “finetune” / etc
- Returns: # noqa: DAR202
Model: model for a given stage.
-
abstract
get_optimizer
(stage: str, model: torch.nn.modules.module.Module) → torch.optim.optimizer.Optimizer[source]¶ Returns the optimizer for a given stage and model.
Example:
>>> experiment.get_optimizer(stage="training", model=model) torch.optim.Adam(model.parameters())
- Parameters
stage (str) – stage name of interest like “pretrain” / “train” / “finetune” / etc
model (Model) – model to optimize with stage optimizer
- Returns: # noqa: DAR202
Optimizer: optimizer for a given stage and model.
-
abstract
get_scheduler
(stage: str, optimizer: torch.optim.optimizer.Optimizer) → torch.optim.lr_scheduler._LRScheduler[source]¶ Returns the scheduler for a given stage and optimizer.
- Example::
>>> experiment.get_scheduler(stage="training", optimizer=optimizer) torch.optim.lr_scheduler.StepLR(optimizer)
- Parameters
stage (str) – stage name of interest like “pretrain” / “train” / “finetune” / etc
optimizer (Optimizer) – optimizer to schedule with stage scheduler
- Returns: # noqa: DAR202
Scheduler: scheduler for a given stage and optimizer.
-
abstract
get_stage_params
(stage: str) → Mapping[str, Any][source]¶ Returns extra stage parameters for a given stage.
Example:
>>> experiment.get_stage_params(stage="training") { "logdir": "./logs/training", "num_epochs": 42, "valid_loader": "valid", "main_metric": "loss", "minimize_metric": True, "checkpoint_data": { "comment": "break the cycle - use the Catalyst" } }
- Parameters
stage (str) – stage name of interest like “pretrain” / “train” / “finetune” / etc
- Returns: # noqa: DAR202
dict: parameters for a given stage.
-
get_transforms
(stage: str = None, dataset: str = None)[source]¶ Returns the data transforms for a given stage and dataset.
# noqa: DAR401, W505
- Parameters
stage (str) – stage name of interest, like “pretrain” / “train” / “finetune” / etc
dataset (str) – dataset name of interest, like “train” / “valid” / “infer”
Note
For datasets/loaders nameing please follow
catalyst.core.runner
documentation.- Returns: # noqa: DAR202
Data transformations to use for specified dataset.
-
abstract property
hparams
¶ Returns hyper-parameters
- Example::
>>> experiment.hparams OrderedDict([('optimizer', 'Adam'), ('lr', 0.02), ('betas', (0.9, 0.999)), ('eps', 1e-08), ('weight_decay', 0), ('amsgrad', False), ('train_batch_size', 32)])
-
abstract property
initial_seed
¶ Experiment’s initial seed, used to setup global seed at the beginning of each stage. Additionally, Catalyst Runner setups experiment.initial_seed + runner.global_epoch + 1 as global seed each epoch. Used for experiment reproducibility.
Example:
>>> experiment.initial_seed 42
-
abstract property
logdir
¶ Path to the directory where the experiment logs would be saved.
Example:
>>> experiment.logdir ./path/to/my/experiment/logs
-
abstract property
stages
¶ Experiment’s stage names.
Example:
>>> experiment.stages ["pretraining", "training", "finetuning"]
Note
To understand stages concept, please follow Catalyst documentation, for example,
catalyst.core.callback.Callback
-
class
catalyst.core.experiment.
IExperiment
[source] Bases:
abc.ABC
An abstraction that contains information about the experiment – a model, a criterion, an optimizer, a scheduler, and their hyperparameters. It also contains information about the data and transformations used. In general, the Experiment knows what you would like to run.
Note
To learn more about Catalyst Core concepts, please check out
Abstraction, please check out the implementations:
catalyst.dl.experiment.base.BaseExperiment
-
abstract property
distributed_params
Dictionary with the parameters for distributed and half-precision training.
Used in
catalyst.utils.distributed.process_components
to setup Nvidia Apex or PyTorch distributed.Example:
>>> experiment.distributed_params {"opt_level": "O1", "syncbn": True} # Apex variant
-
abstract
get_callbacks
(stage: str) → OrderedDict[str, Callback][source] Returns callbacks for a given stage.
Note
To learn more about Catalyst Callbacks mechanism, please follow
catalyst.core.callback.Callback
documentation.Note
We need ordered dictionary to guarantee the correct dataflow and order of metrics optimization. For example, to compute loss before optimization, or to compute all the metrics before logging :)
- Parameters
stage (str) – stage name of interest like “pretrain” / “train” / “finetune” / etc
- Returns: # noqa: DAR202
OrderedDict[str, Callback]: Ordered dictionary # noqa: DAR202 with callbacks for current stage.
Note
To learn more about Catalyst Core concepts, please check out
- Parameters
stage (str) – stage name of interest, like “pretrain” / “train” / “finetune” / etc
- Returns
- Ordered dictionary
with callbacks for current stage.
- Return type
OrderedDict[str, Callback]
-
abstract
get_criterion
(stage: str) → torch.nn.modules.module.Module[source] Returns the criterion for a given stage.
Example:
# for typical classification task >>> experiment.get_criterion(stage="training") nn.CrossEntropyLoss()
- Parameters
stage (str) – stage name of interest like “pretrain” / “train” / “finetune” / etc
- Returns: # noqa: DAR202
Criterion: criterion for a given stage.
-
get_datasets
(stage: str, epoch: int = None, **kwargs) → OrderedDict[str, Dataset][source] Returns the datasets for a given stage and epoch. # noqa: DAR401
Note
For Deep Learning cases you have the same dataset during whole stage.
For Reinforcement Learning it common to change the dataset (experiment) every training epoch.
- Parameters
stage (str) – stage name of interest, like “pretrain” / “train” / “finetune” / etc
epoch (int) – epoch index
**kwargs (dict) – additional parameters to use during dataset creation
- Returns: # noqa: DAR202
- OrderedDict[str, Dataset]: Ordered dictionary
with datasets for current stage and epoch.
Note
We need ordered dictionary to guarantee the correct dataflow and order of our training datasets. For example, to run through train data before validation one :)
Example:
>>> experiment.get_datasets( >>> stage="training", >>> in_csv_train="path/to/train/csv", >>> in_csv_valid="path/to/valid/csv", >>> ) OrderedDict({ "train": CsvDataset(in_csv=in_csv_train, ...), "valid": CsvDataset(in_csv=in_csv_valid, ...), })
-
get_experiment_components
(stage: str, model: torch.nn.modules.module.Module = None) → Tuple[torch.nn.modules.module.Module, torch.nn.modules.module.Module, torch.optim.optimizer.Optimizer, torch.optim.lr_scheduler._LRScheduler][source] Returns the tuple containing criterion, optimizer and scheduler by giving model and stage.
Aggregation method, based on,
- Parameters
stage (str) – stage name of interest, like “pretrain” / “train” / “finetune” / etc
model (Model) – model to optimize with stage optimizer
- Returns
- model, criterion, optimizer, scheduler
for a given stage and model
- Return type
tuple
-
abstract
get_loaders
(stage: str, epoch: int = None) → OrderedDict[str, DataLoader][source] Returns the loaders for a given stage. # noqa: DAR401
Note
Wrapper for
catalyst.core.experiment.IExperiment.get_datasets
. For most of your experiments you need to rewrite get_datasets method only.- Parameters
stage (str) – stage name of interest, like “pretrain” / “train” / “finetune” / etc
epoch (int) – epoch index
- Returns: # noqa: DAR202
- OrderedDict[str, DataLoader]: Ordered dictionary
with loaders for current stage and epoch.
-
abstract
get_model
(stage: str) → torch.nn.modules.module.Module[source] Returns the model for a given stage.
Example:
# suppose we have typical MNIST model, like # nn.Sequential(nn.Linear(28*28, 128), nn.Linear(128, 10)) >>> experiment.get_model(stage="training") Sequential( (0): Linear(in_features=784, out_features=128, bias=True) (1): Linear(in_features=128, out_features=10, bias=True) )
- Parameters
stage (str) – stage name of interest like “pretrain” / “train” / “finetune” / etc
- Returns: # noqa: DAR202
Model: model for a given stage.
-
abstract
get_optimizer
(stage: str, model: torch.nn.modules.module.Module) → torch.optim.optimizer.Optimizer[source] Returns the optimizer for a given stage and model.
Example:
>>> experiment.get_optimizer(stage="training", model=model) torch.optim.Adam(model.parameters())
- Parameters
stage (str) – stage name of interest like “pretrain” / “train” / “finetune” / etc
model (Model) – model to optimize with stage optimizer
- Returns: # noqa: DAR202
Optimizer: optimizer for a given stage and model.
-
abstract
get_scheduler
(stage: str, optimizer: torch.optim.optimizer.Optimizer) → torch.optim.lr_scheduler._LRScheduler[source] Returns the scheduler for a given stage and optimizer.
- Example::
>>> experiment.get_scheduler(stage="training", optimizer=optimizer) torch.optim.lr_scheduler.StepLR(optimizer)
- Parameters
stage (str) – stage name of interest like “pretrain” / “train” / “finetune” / etc
optimizer (Optimizer) – optimizer to schedule with stage scheduler
- Returns: # noqa: DAR202
Scheduler: scheduler for a given stage and optimizer.
-
abstract
get_stage_params
(stage: str) → Mapping[str, Any][source] Returns extra stage parameters for a given stage.
Example:
>>> experiment.get_stage_params(stage="training") { "logdir": "./logs/training", "num_epochs": 42, "valid_loader": "valid", "main_metric": "loss", "minimize_metric": True, "checkpoint_data": { "comment": "break the cycle - use the Catalyst" } }
- Parameters
stage (str) – stage name of interest like “pretrain” / “train” / “finetune” / etc
- Returns: # noqa: DAR202
dict: parameters for a given stage.
-
get_transforms
(stage: str = None, dataset: str = None)[source] Returns the data transforms for a given stage and dataset.
# noqa: DAR401, W505
- Parameters
stage (str) – stage name of interest, like “pretrain” / “train” / “finetune” / etc
dataset (str) – dataset name of interest, like “train” / “valid” / “infer”
Note
For datasets/loaders nameing please follow
catalyst.core.runner
documentation.- Returns: # noqa: DAR202
Data transformations to use for specified dataset.
-
abstract property
hparams
Returns hyper-parameters
- Example::
>>> experiment.hparams OrderedDict([('optimizer', 'Adam'), ('lr', 0.02), ('betas', (0.9, 0.999)), ('eps', 1e-08), ('weight_decay', 0), ('amsgrad', False), ('train_batch_size', 32)])
-
abstract property
initial_seed
Experiment’s initial seed, used to setup global seed at the beginning of each stage. Additionally, Catalyst Runner setups experiment.initial_seed + runner.global_epoch + 1 as global seed each epoch. Used for experiment reproducibility.
Example:
>>> experiment.initial_seed 42
-
abstract property
logdir
Path to the directory where the experiment logs would be saved.
Example:
>>> experiment.logdir ./path/to/my/experiment/logs
-
abstract property
stages
Experiment’s stage names.
Example:
>>> experiment.stages ["pretraining", "training", "finetuning"]
Note
To understand stages concept, please follow Catalyst documentation, for example,
catalyst.core.callback.Callback
Runner¶
-
class
catalyst.core.runner.
IRunner
(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None, **kwargs)[source]¶ Bases:
abc.ABC
,catalyst.core.legacy.IRunnerLegacy
,catalyst.tools.frozen_class.FrozenClass
An abstraction that knows how to run an experiment. It contains all the logic of how to run the experiment, stages, epoch and batches.
Note
To learn more about Catalyst Core concepts, please check out
Abstraction, please check out the implementations:
Runner also contains full information about experiment runner.
Runner section
runner.model - an instance of torch.nn.Module class, (should implement
forward
method); for example,runner.model = torch.nn.Linear(10, 10)
runner.device - an instance of torch.device (CPU, GPU, TPU); for example,
runner.device = torch.device("cpu")
Experiment section
runner.criterion - an instance of torch.nn.Module class or torch.nn.modules.loss._Loss (should implement
forward
method); for example,runner.criterion = torch.nn.CrossEntropyLoss()
runner.optimizer - an instance of torch.optim.optimizer.Optimizer (should implement
step
method); for example,runner.optimizer = torch.optim.Adam()
runner.scheduler - an instance of torch.optim.lr_scheduler._LRScheduler (should implement
step
method); for example,runner.scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau()
runner.callbacks - ordered dictionary with Catalyst.Callback instances; for example,
runner.callbacks = { "accuracy": AccuracyCallback(), "criterion": CriterionCallback(), "optim": OptimizerCallback(), "saver": CheckpointCallback() }
Dataflow section
runner.loaders - ordered dictionary with torch.DataLoaders; for example,
runner.loaders = { "train": MnistTrainLoader(), "valid": MnistValidLoader() }
Note
“train” prefix is used for training loaders - metrics computations, backward pass, optimization
“valid” prefix is used for validation loaders - metrics computations only
“infer” prefix is used for inference loaders - dataset prediction
runner.input - dictionary, containing batch of data from currents DataLoader; for example,
runner.input = { "images": np.ndarray(batch_size, c, h, w), "targets": np.ndarray(batch_size, 1), }
runner.output - dictionary, containing model output for current batch; for example,
runner.output = {"logits": torch.Tensor(batch_size, num_classes)}
Metrics section
runner.batch_metrics - dictionary, flatten storage for batch metrics; for example,
runner.batch_metrics = {"loss": ..., "accuracy": ..., "iou": ...}
runner.loader_metrics - dictionary with aggregated batch statistics for loader (mean over all batches) and global loader metrics, like AUC; for example,
runner.loader_metrics = {"loss": ..., "accuracy": ..., "auc": ...}
runner.epoch_metrics - dictionary with summarized metrics for different loaders and global epoch metrics, like lr, momentum; for example,
runner.epoch_metrics = { "train_loss": ..., "train_auc": ..., "valid_loss": ..., "lr": ..., "momentum": ..., }
Validation metrics section
runner.main_metric - string, containing name of metric of interest for optimization, validation and checkpointing during training
runner.minimize_metric - bool, indicator flag
True
if we need to minimize metric during training, like Cross Entropy lossFalse
if we need to maximize metric during training, like Accuracy or Intersection over Union
Validation section
runner.valid_loader - string, name of validation loader for metric selection, validation and model checkpoining
runner.valid_metrics - dictionary with validation metrics for currect epoch; for example,
runner.valid_metrics = {"loss": ..., "accuracy": ..., "auc": ...}
Note
subdictionary of epoch_metrics
runner.is_best_valid - bool, indicator flag
True
if this training epoch is best over all epochsFalse
if not
runner.best_valid_metrics - dictionary with best validation metrics during whole training process
Distributed section
runner.distributed_rank - distributed rank of current worker
runner.is_distributed_master - bool, indicator flag
True
if is master node (runner.distributed_rank == 0)False
if is worker node (runner.distributed_rank != 0)
runner.is_distributed_worker - bool, indicator flag
True
if is worker node (runner.distributed_rank > 0)False
if is master node (runner.distributed_rank <= 0)
Experiment info section
runner.global_sample_step - int, numerical indicator, counter for all individual samples, that passes through our model during training, validation and inference stages
runner.global_batch_step - int, numerical indicator, counter for all batches, that passes through our model during training, validation and inference stages
runner.global_epoch - int, numerical indicator, counter for all epochs, that have passed during model training, validation and inference stages
runner.verbose - bool, indicator flag
runner.is_check_run - bool, indicator flag
True
if you want to check you pipeline and run only 2 batches per loader and 2 epochs per stageFalse
(default) if you want to just the pipeline
runner.need_early_stop - bool, indicator flag used for EarlyStopping and CheckRun Callbacks
True
if we need to stop the trainingFalse
(default) otherwise
runner.need_exception_reraise - bool, indicator flag
True
(default) if you want to show exception during pipeline and stop the training processFalse
otherwise
Stage info section
runner.stage_name - string, current stage name, for example,
runner.stage_name = "pretraining" / "training" / "finetuning" / etc
runner.num_epochs - int, maximum number of epochs, required for this stage
runner.is_infer_stage - bool, indicator flag
True
for inference stagesFalse
otherwise
Epoch info section
runner.epoch - int, numerical indicator for current stage epoch
Loader info section
runner.loader_sample_step - int, numerical indicator for number of samples passed through our model in current loader
runner.loader_batch_step - int, numerical indicator for batch index in current loader
runner.loader_name - string, current loader name for example,
runner.loader_name = "train_dataset1" / "valid_data2" / "infer_golden"
runner.loader_len - int, maximum number of batches in current loader
runner.loader_batch_size - int, batch size parameter in current loader
runner.is_train_loader - bool, indicator flag
True
for training loadersFalse
otherwise
runner.is_valid_loader - bool, indicator flag
True
for validation loadersFalse
otherwise
runner.is_infer_loader - bool, indicator flag
True
for inference loadersFalse
otherwise
Batch info section
runner.batch_size - int, length of the current batch
Logging section
runner.logdir - string, path to logging directory to save all logs, metrics, checkpoints and artifacts
runner.checkpoint_data - dictionary with all extra data for experiment tracking
Extra section
runner.exception - python Exception instance to raise (or not ;) )
-
__init__
(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None, **kwargs)[source]¶ - Parameters
model (RunnerModel) – Torch model object
device (Device) – Torch device
-
property
device
¶ Returns the runner’s device instance.
-
get_attr
(key: str, inner_key: str = None) → Any[source]¶ Alias for python getattr method. Useful for Callbacks preparation and cases with multi-criterion, multi-optimizer setup. For example, when you would like to train multi-task classification.
Used to get a named attribute from a IRunner by key keyword; for example
# example 1 runner.get_attr("criterion") # is equivalent to runner.criterion # example 2 runner.get_attr("optimizer") # is equivalent to runner.optimizer # example 3 runner.get_attr("scheduler") # is equivalent to runner.scheduler
With inner_key usage, it suppose to find a dictionary under key and would get inner_key from this dict; for example,
# example 1 runner.get_attr("criterion", "bce") # is equivalent to runner.criterion["bce"] # example 2 runner.get_attr("optimizer", "adam") # is equivalent to runner.optimizer["adam"] # example 3 runner.get_attr("scheduler", "adam") # is equivalent to runner.scheduler["adam"]
- Parameters
key (str) – name for attribute of interest, like criterion, optimizer, scheduler
inner_key (str) – name of inner dictionary key
- Returns
inner attribute
-
property
model
¶ Returns the runner’s model instance.
-
run_experiment
(experiment: catalyst.core.experiment.IExperiment = None) → catalyst.core.runner.IRunner[source]¶ Starts the experiment.
- Parameters
experiment (IExperiment) – Experiment instance to use for Runner.
- Returns
self, IRunner instance after the experiment
- Raises
Exception – if during pipeline exception, no handler we found into callbacks
KeyboardInterrupt – if during pipeline exception, no handler we found into callbacks
-
class
catalyst.core.runner.
IStageBasedRunner
(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None, **kwargs)[source]¶ Bases:
catalyst.core.runner.IRunner
Runner abstraction that suppose to have constant datasources per stage.
-
class
catalyst.core.runner.
IRunner
(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None, **kwargs)[source] Bases:
abc.ABC
,catalyst.core.legacy.IRunnerLegacy
,catalyst.tools.frozen_class.FrozenClass
An abstraction that knows how to run an experiment. It contains all the logic of how to run the experiment, stages, epoch and batches.
Note
To learn more about Catalyst Core concepts, please check out
Abstraction, please check out the implementations:
Runner also contains full information about experiment runner.
Runner section
runner.model - an instance of torch.nn.Module class, (should implement
forward
method); for example,runner.model = torch.nn.Linear(10, 10)
runner.device - an instance of torch.device (CPU, GPU, TPU); for example,
runner.device = torch.device("cpu")
Experiment section
runner.criterion - an instance of torch.nn.Module class or torch.nn.modules.loss._Loss (should implement
forward
method); for example,runner.criterion = torch.nn.CrossEntropyLoss()
runner.optimizer - an instance of torch.optim.optimizer.Optimizer (should implement
step
method); for example,runner.optimizer = torch.optim.Adam()
runner.scheduler - an instance of torch.optim.lr_scheduler._LRScheduler (should implement
step
method); for example,runner.scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau()
runner.callbacks - ordered dictionary with Catalyst.Callback instances; for example,
runner.callbacks = { "accuracy": AccuracyCallback(), "criterion": CriterionCallback(), "optim": OptimizerCallback(), "saver": CheckpointCallback() }
Dataflow section
runner.loaders - ordered dictionary with torch.DataLoaders; for example,
runner.loaders = { "train": MnistTrainLoader(), "valid": MnistValidLoader() }
Note
“train” prefix is used for training loaders - metrics computations, backward pass, optimization
“valid” prefix is used for validation loaders - metrics computations only
“infer” prefix is used for inference loaders - dataset prediction
runner.input - dictionary, containing batch of data from currents DataLoader; for example,
runner.input = { "images": np.ndarray(batch_size, c, h, w), "targets": np.ndarray(batch_size, 1), }
runner.output - dictionary, containing model output for current batch; for example,
runner.output = {"logits": torch.Tensor(batch_size, num_classes)}
Metrics section
runner.batch_metrics - dictionary, flatten storage for batch metrics; for example,
runner.batch_metrics = {"loss": ..., "accuracy": ..., "iou": ...}
runner.loader_metrics - dictionary with aggregated batch statistics for loader (mean over all batches) and global loader metrics, like AUC; for example,
runner.loader_metrics = {"loss": ..., "accuracy": ..., "auc": ...}
runner.epoch_metrics - dictionary with summarized metrics for different loaders and global epoch metrics, like lr, momentum; for example,
runner.epoch_metrics = { "train_loss": ..., "train_auc": ..., "valid_loss": ..., "lr": ..., "momentum": ..., }
Validation metrics section
runner.main_metric - string, containing name of metric of interest for optimization, validation and checkpointing during training
runner.minimize_metric - bool, indicator flag
True
if we need to minimize metric during training, like Cross Entropy lossFalse
if we need to maximize metric during training, like Accuracy or Intersection over Union
Validation section
runner.valid_loader - string, name of validation loader for metric selection, validation and model checkpoining
runner.valid_metrics - dictionary with validation metrics for currect epoch; for example,
runner.valid_metrics = {"loss": ..., "accuracy": ..., "auc": ...}
Note
subdictionary of epoch_metrics
runner.is_best_valid - bool, indicator flag
True
if this training epoch is best over all epochsFalse
if not
runner.best_valid_metrics - dictionary with best validation metrics during whole training process
Distributed section
runner.distributed_rank - distributed rank of current worker
runner.is_distributed_master - bool, indicator flag
True
if is master node (runner.distributed_rank == 0)False
if is worker node (runner.distributed_rank != 0)
runner.is_distributed_worker - bool, indicator flag
True
if is worker node (runner.distributed_rank > 0)False
if is master node (runner.distributed_rank <= 0)
Experiment info section
runner.global_sample_step - int, numerical indicator, counter for all individual samples, that passes through our model during training, validation and inference stages
runner.global_batch_step - int, numerical indicator, counter for all batches, that passes through our model during training, validation and inference stages
runner.global_epoch - int, numerical indicator, counter for all epochs, that have passed during model training, validation and inference stages
runner.verbose - bool, indicator flag
runner.is_check_run - bool, indicator flag
True
if you want to check you pipeline and run only 2 batches per loader and 2 epochs per stageFalse
(default) if you want to just the pipeline
runner.need_early_stop - bool, indicator flag used for EarlyStopping and CheckRun Callbacks
True
if we need to stop the trainingFalse
(default) otherwise
runner.need_exception_reraise - bool, indicator flag
True
(default) if you want to show exception during pipeline and stop the training processFalse
otherwise
Stage info section
runner.stage_name - string, current stage name, for example,
runner.stage_name = "pretraining" / "training" / "finetuning" / etc
runner.num_epochs - int, maximum number of epochs, required for this stage
runner.is_infer_stage - bool, indicator flag
True
for inference stagesFalse
otherwise
Epoch info section
runner.epoch - int, numerical indicator for current stage epoch
Loader info section
runner.loader_sample_step - int, numerical indicator for number of samples passed through our model in current loader
runner.loader_batch_step - int, numerical indicator for batch index in current loader
runner.loader_name - string, current loader name for example,
runner.loader_name = "train_dataset1" / "valid_data2" / "infer_golden"
runner.loader_len - int, maximum number of batches in current loader
runner.loader_batch_size - int, batch size parameter in current loader
runner.is_train_loader - bool, indicator flag
True
for training loadersFalse
otherwise
runner.is_valid_loader - bool, indicator flag
True
for validation loadersFalse
otherwise
runner.is_infer_loader - bool, indicator flag
True
for inference loadersFalse
otherwise
Batch info section
runner.batch_size - int, length of the current batch
Logging section
runner.logdir - string, path to logging directory to save all logs, metrics, checkpoints and artifacts
runner.checkpoint_data - dictionary with all extra data for experiment tracking
Extra section
runner.exception - python Exception instance to raise (or not ;) )
-
__init__
(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None, **kwargs)[source] - Parameters
model (RunnerModel) – Torch model object
device (Device) – Torch device
-
property
device
Returns the runner’s device instance.
-
get_attr
(key: str, inner_key: str = None) → Any[source] Alias for python getattr method. Useful for Callbacks preparation and cases with multi-criterion, multi-optimizer setup. For example, when you would like to train multi-task classification.
Used to get a named attribute from a IRunner by key keyword; for example
# example 1 runner.get_attr("criterion") # is equivalent to runner.criterion # example 2 runner.get_attr("optimizer") # is equivalent to runner.optimizer # example 3 runner.get_attr("scheduler") # is equivalent to runner.scheduler
With inner_key usage, it suppose to find a dictionary under key and would get inner_key from this dict; for example,
# example 1 runner.get_attr("criterion", "bce") # is equivalent to runner.criterion["bce"] # example 2 runner.get_attr("optimizer", "adam") # is equivalent to runner.optimizer["adam"] # example 3 runner.get_attr("scheduler", "adam") # is equivalent to runner.scheduler["adam"]
- Parameters
key (str) – name for attribute of interest, like criterion, optimizer, scheduler
inner_key (str) – name of inner dictionary key
- Returns
inner attribute
-
property
model
Returns the runner’s model instance.
-
run_experiment
(experiment: catalyst.core.experiment.IExperiment = None) → catalyst.core.runner.IRunner[source] Starts the experiment.
- Parameters
experiment (IExperiment) – Experiment instance to use for Runner.
- Returns
self, IRunner instance after the experiment
- Raises
Exception – if during pipeline exception, no handler we found into callbacks
KeyboardInterrupt – if during pipeline exception, no handler we found into callbacks
-
class
catalyst.core.runner.
IStageBasedRunner
(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]] = None, device: Union[str, torch.device] = None, **kwargs)[source] Bases:
catalyst.core.runner.IRunner
Runner abstraction that suppose to have constant datasources per stage.
Callback¶
-
class
catalyst.core.callback.
Callback
(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]¶ Bases:
object
An abstraction that lets you customize your experiment run logic. To give users maximum flexibility and extensibility Catalyst supports callback execution anywhere in the training loop:
-- stage start ---- epoch start ------ loader start -------- batch start ---------- batch handler (Runner logic) -------- batch end ------ loader end ---- epoch end -- stage end exception – if an Exception was raised
- All callbacks have
order
fromCallbackOrder
node
fromCallbackNode
scope
fromCallbackScope
Note
To learn more about Catalyst Core concepts, please check out
Abstraction, please check out the implementations:
-
__init__
(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]¶ Callback initializer.
- Parameters
order – flag from
CallbackOrder
node – flag from
CallbackNode
scope – flag from
CallbackScope
-
on_batch_end
(runner: IRunner)[source]¶ Event handler for batch end.
- Parameters
runner ("IRunner") – IRunner instance.
-
on_batch_start
(runner: IRunner)[source]¶ Event handler for batch start.
- Parameters
runner ("IRunner") – IRunner instance.
-
on_epoch_end
(runner: IRunner)[source]¶ Event handler for epoch end.
- Parameters
runner ("IRunner") – IRunner instance.
-
on_epoch_start
(runner: IRunner)[source]¶ Event handler for epoch start.
- Parameters
runner ("IRunner") – IRunner instance.
-
on_exception
(runner: IRunner)[source]¶ Event handler for exception case.
- Parameters
runner ("IRunner") – IRunner instance.
-
on_loader_end
(runner: IRunner)[source]¶ Event handler for loader end.
- Parameters
runner ("IRunner") – IRunner instance.
-
on_loader_start
(runner: IRunner)[source]¶ Event handler for loader start.
- Parameters
runner ("IRunner") – IRunner instance.
-
class
catalyst.core.callback.
CallbackNode
[source]¶ Bases:
enum.IntFlag
Callback node usage flag during distributed training.
All (0) - use on all nodes, botch master and worker.
Master (1) - use only on master node.
Worker (2) - use only in worker nodes.
-
All
= 0¶
-
Master
= 1¶
-
Worker
= 2¶
-
all
= 0¶
-
master
= 1¶
-
worker
= 2¶
-
class
catalyst.core.callback.
CallbackOrder
[source]¶ Bases:
enum.IntFlag
Callback usage order during training.
Catalyst executes Callbacks with low CallbackOrder before Callbacks with high CallbackOrder.
Predefined orders:
Internal (0) - some Catalyst Extras, like PhaseCallbacks (used in GANs).
Metric (20) - Callbacks with metrics and losses computation.
MetricAggregation (40) - metrics aggregation callbacks, like sum different losses into one.
Optimizer (60) - optimizer step, requires computed metrics for optimization.
Validation (80) - validation step, computes validation metrics subset based on all metrics.
Scheduler (100) - scheduler step, in ReduceLROnPlateau case requires computed validation metrics for optimizer schedule.
Logging (120) - logging step, logs metrics to Console/Tensorboard/Alchemy, requires computed metrics.
External (200) - additional callbacks with custom logic, like InferenceCallbacks
Nevertheless, you always can create CustomCallback with any order, for example:
>>> class MyCustomCallback(Callback): >>> def __init__(self): >>> super().__init__(order=42) >>> ... # MyCustomCallback will be executed after all `Metric`-Callbacks # but before all `MetricAggregation`-Callbacks.
-
External
= 200¶
-
Internal
= 0¶
-
Logging
= 120¶
-
Metric
= 20¶
-
MetricAggregation
= 40¶
-
Optimizer
= 60¶
-
Scheduler
= 100¶
-
Validation
= 80¶
-
external
= 200¶
-
internal
= 0¶
-
logging
= 120¶
-
metric
= 20¶
-
metric_aggregation
= 40¶
-
optimizer
= 60¶
-
scheduler
= 100¶
-
validation
= 80¶
-
class
catalyst.core.callback.
CallbackScope
[source]¶ Bases:
enum.IntFlag
Callback scope usage flag during training.
Stage (0) - use Callback only during one experiment stage.
Experiment (1) - use Callback during whole experiment run.
-
Experiment
= 1¶
-
Stage
= 0¶
-
experiment
= 1¶
-
stage
= 0¶
-
class
catalyst.core.callback.
WrapperCallback
(base_callback: catalyst.core.callback.Callback, enable_callback: bool = True)[source]¶ Bases:
catalyst.core.callback.Callback
Enable/disable callback execution.
-
__init__
(base_callback: catalyst.core.callback.Callback, enable_callback: bool = True)[source]¶ - Parameters
base_callback (Callback) – callback to wrap
enable_callback (boolean) – indicator to enable/disable callback, if
True
then callback will be enabled, defaultTrue
-
on_batch_end
(runner: IRunner) → None[source]¶ Run base_callback (if possible)
- Parameters
runner (IRunner) – current runner
-
on_batch_start
(runner: IRunner) → None[source]¶ Run base_callback (if possible)
- Parameters
runner (IRunner) – current runner
-
on_epoch_end
(runner: IRunner) → None[source]¶ Run base_callback (if possible)
- Parameters
runner (IRunner) – current runner
-
on_epoch_start
(runner: IRunner) → None[source]¶ Run base_callback (if possible)
- Parameters
runner (IRunner) – current runner
-
on_exception
(runner: IRunner) → None[source]¶ Run base_callback (if possible)
- Parameters
runner (IRunner) – current runner
-
on_loader_end
(runner: IRunner) → None[source]¶ Reset status of callback
- Parameters
runner (IRunner) – current runner
-
on_loader_start
(runner: IRunner) → None[source]¶ Check if current epoch should be skipped.
- Parameters
runner (IRunner) – current runner
-
Callbacks¶
BatchOverfitCallback¶
-
class
catalyst.core.callbacks.batch_overfit.
BatchOverfitCallback
(**kwargs)[source]¶ Bases:
catalyst.core.callback.Callback
Callback for ovefitting loaders with specified number of batches. By default we use
1
batch for loader.For example, if you have
train
,train_additional
,valid
andvalid_additional
loaders and wan’t to overfittrain
on first 1 batch,train_additional
on first 2 batches,valid
- on first 20% of batches andvalid_additional
- on 50% batches:from catalyst.dl import ( SupervisedRunner, BatchOverfitCallback, ) runner = SupervisedRunner() runner.train( ... loaders={ "train": ..., "train_additional": ..., "valid": ..., "valid_additional":... } ... callbacks=[ ... BatchOverfitCallback( train_additional=2, valid=0.2, valid_additional=0.5 ), ... ] ... )
Minimal working example
import torch from torch.utils.data import DataLoader, TensorDataset from catalyst import dl # data num_samples, num_features = int(1e4), int(1e1) X, y = torch.rand(num_samples, num_features), torch.rand(num_samples) dataset = TensorDataset(X, y) loader = DataLoader(dataset, batch_size=32, num_workers=1) loaders = {"train": loader, "valid": loader} # model, criterion, optimizer, scheduler model = torch.nn.Linear(num_features, 1) criterion = torch.nn.MSELoss() optimizer = torch.optim.Adam(model.parameters()) scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [3, 6]) # model training runner = dl.SupervisedRunner() runner.train( model=model, criterion=criterion, optimizer=optimizer, scheduler=scheduler, loaders=loaders, logdir="./logdir", num_epochs=8, verbose=True, callbacks=[dl.BatchOverfitCallback(train=10, valid=0.5)] )
-
__init__
(**kwargs)[source]¶ - Parameters
kwargs – loader names and their number of batches to overfit.
-
Checkpoint¶
-
class
catalyst.core.callbacks.checkpoint.
CheckpointCallback
(save_n_best: int = 1, resume: str = None, resume_dir: str = None, metrics_filename: str = '_metrics.json', load_on_stage_start: Union[str, Dict[str, str]] = None, load_on_stage_end: Union[str, Dict[str, str]] = None)[source]¶ Bases:
catalyst.core.callbacks.checkpoint.BaseCheckpointCallback
Checkpoint callback to save/restore your model/criterion/optimizer/scheduler.
-
__init__
(save_n_best: int = 1, resume: str = None, resume_dir: str = None, metrics_filename: str = '_metrics.json', load_on_stage_start: Union[str, Dict[str, str]] = None, load_on_stage_end: Union[str, Dict[str, str]] = None)[source]¶ - Parameters
save_n_best (int) – number of best checkpoint to keep, if
0
then store only last state of model andload_on_stage_end
should be one oflast
orlast_full
.resume (str) – path to checkpoint to load and initialize runner state
resume_dir (str) – directory with checkpoints, if specified in combination with
resume
than resume checkpoint will be loaded fromresume_dir
metrics_filename (str) – filename to save metrics in checkpoint folder. Must ends on
.json
or.yml
load_on_stage_start (str or Dict[str, str]) –
load specified state/model at stage start.
If passed string then will be performed initialization from specified state (
best
/best_full
/last
/last_full
) or checkpoint file.If passed dict then will be performed initialization only for specified parts - model, criterion, optimizer, scheduler.
Example
>>> # possible checkpoints to use: >>> # "best"/"best_full"/"last"/"last_full" >>> # or path to specific checkpoint >>> to_load = { >>> "model": "path/to/checkpoint.pth", >>> "criterion": "best", >>> "optimizer": "last_full", >>> "scheduler": "best_full", >>> } >>> CheckpointCallback(load_on_stage_start=to_load)
All other keys instead of
"model"
,"criterion"
,"optimizer"
and"scheduler"
will be ignored.If
None
or an empty dict (or dict without mentioned above keys) then no action is required at stage start and:Config API - will be used best state of model
Notebook API - no action will be performed (will be used the last state)
NOTE: Loading will be performed on all stages except first.
NOTE: Criterion, optimizer and scheduler are optional keys and should be loaded from full checkpoint.
Model state can be loaded from any checkpoint.
When dict contains keys for model and some other part (for example
{"model": "last", "optimizer": "last"}
) and they match in prefix ("best"
and"best_full"
) then will be loaded full checkpoint because it contains required states.load_on_stage_end (str or Dict[str, str]) –
load specified state/model at stage end.
If passed string then will be performed initialization from specified state (
best
/best_full
/last
/last_full
) or checkpoint file.If passed dict then will be performed initialization only for specified parts - model, criterion, optimizer, scheduler. Logic for dict is the same as for
load_on_stage_start
.If
None
then no action is required at stage end and will be used the last runner.NOTE: Loading will be performed always at stage end.
-
on_epoch_end
(runner: catalyst.core.runner.IRunner) → None[source]¶ Collect and save checkpoint after epoch.
- Parameters
runner (IRunner) – current runner
-
on_stage_end
(runner: catalyst.core.runner.IRunner) → None[source]¶ Show information about best checkpoints during the stage and load model specified in
load_on_stage_end
.- Parameters
runner (IRunner) – current runner
-
on_stage_start
(runner: catalyst.core.runner.IRunner) → None[source]¶ Setup model for stage.
Note
If CheckpointCallback initialized with
resume
(as path to checkpoint file) orresume
(as filename) andresume_dir
(as directory with file) then will be performed loading checkpoint.- Parameters
runner (IRunner) – current runner
-
process_checkpoint
(logdir: Union[str, pathlib.Path], checkpoint: Dict, is_best: bool, main_metric: str = 'loss', minimize_metric: bool = True) → None[source]¶ Save checkpoint and metrics.
- Parameters
logdir (str or Path object) – directory for storing checkpoints
checkpoint (dict) – dict with checkpoint data
is_best (bool) – indicator to save best checkpoint, if true then will be saved two additional checkpoints -
best
andbest_full
.main_metric (str) – metric to use for selecting the best model
minimize_metric (bool) – indicator for selecting best metric, if true then best metric will be the metric with the lowest value, otherwise with the greatest value.
-
process_metrics
(last_valid_metrics: Dict[str, float]) → Dict[source]¶ Add last validation metrics to list of previous validation metrics and keep
save_n_best
metrics.- Parameters
last_valid_metrics (dict) – dict with metrics from last validation step.
- Returns
processed metrics
- Return type
OrderedDict
-
-
class
catalyst.core.callbacks.checkpoint.
IterationCheckpointCallback
(save_n_last: int = 1, period: int = 100, stage_restart: bool = True, metrics_filename: str = '_metrics_iter.json', load_on_stage_end: str = 'best_full')[source]¶ Bases:
catalyst.core.callbacks.checkpoint.BaseCheckpointCallback
Iteration checkpoint callback to save your model/criterion/optimizer.
-
__init__
(save_n_last: int = 1, period: int = 100, stage_restart: bool = True, metrics_filename: str = '_metrics_iter.json', load_on_stage_end: str = 'best_full')[source]¶ - Parameters
save_n_last (int) – number of last checkpoint to keep
period (int) – save the checkpoint every period
stage_restart (bool) – restart counter every stage or not
metrics_filename (str) – filename to save metrics in checkpoint folder. Must ends on
.json
or.yml
load_on_stage_end (str) – name of the model to load at the end of the stage. You can use
best
,best_full
(default) to load the best model according to validation metrics, orlast
last_full
to use just the last one.
-
on_batch_end
(runner: catalyst.core.runner.IRunner)[source]¶ Save checkpoint based on batches count.
- Parameters
runner (IRunner) – current runner
-
on_stage_end
(runner: catalyst.core.runner.IRunner)[source]¶ Load model specified in
load_on_stage_end
.- Parameters
runner (IRunner) – current runner
-
on_stage_start
(runner: catalyst.core.runner.IRunner)[source]¶ Reset iterations counter.
- Parameters
runner (IRunner) – current runner
-
process_checkpoint
(logdir: Union[str, pathlib.Path], checkpoint: Dict, batch_metrics: Dict[str, float])[source]¶ Save checkpoint and metrics.
- Parameters
logdir (str or Path object) – directory for storing checkpoints
checkpoint (dict) – dict with checkpoint data
batch_metrics (dict) – dict with metrics based on a few batches
-
-
class
catalyst.core.callbacks.checkpoint.
ICheckpointCallback
(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]¶ Bases:
catalyst.core.callback.Callback
Checkpoint callback interface, abstraction over model checkpointing step.
-
class
catalyst.core.callbacks.checkpoint.
BaseCheckpointCallback
(metrics_filename: str = '_metrics.json')[source]¶ Bases:
catalyst.core.callbacks.checkpoint.ICheckpointCallback
Base class for all checkpoint callbacks.
Control Flow¶
-
class
catalyst.core.callbacks.control_flow.
ControlFlowCallback
(base_callback: catalyst.core.callback.Callback, epochs: Union[int, Sequence[int]] = None, ignore_epochs: Union[int, Sequence[int]] = None, loaders: Union[str, Sequence[str], Mapping[str, Union[int, Sequence[int]]]] = None, ignore_loaders: Union[str, Sequence[str], Mapping[str, Union[int, Sequence[int]]]] = None, filter_fn: Union[str, Callable[[str, int, str], bool]] = None, use_global_epochs: bool = False)[source]¶ Bases:
catalyst.core.callback.WrapperCallback
Enable/disable callback execution on different stages, loaders and epochs.
Note
Please run experiment with
check option
to check if everything works as expected with this callback.For example, if you don’t want to compute loss on a validation you can ignore
CriterionCallback
, for notebook API need to wrap callback:import torch from torch.utils.data import DataLoader, TensorDataset from catalyst.dl import ( SupervisedRunner, AccuracyCallback, CriterionCallback, ControlFlowCallback, ) num_samples, num_features = 10_000, 10 n_classes = 10 X = torch.rand(num_samples, num_features) y = torch.randint(0, n_classes, [num_samples]) loader = DataLoader(TensorDataset(X, y), batch_size=32, num_workers=1) loaders = {"train": loader, "valid": loader} model = torch.nn.Linear(num_features, n_classes) criterion = torch.nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters()) scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [3, 6]) runner = SupervisedRunner() runner.train( model=model, criterion=criterion, optimizer=optimizer, scheduler=scheduler, loaders=loaders, logdir="./logdir", num_epochs=5, verbose=False, main_metric="accuracy03", minimize_metric=False, callbacks=[ AccuracyCallback( accuracy_args=[1, 3, 5] ), ControlFlowCallback( base_callback=CriterionCallback(), ignore_loaders="valid" # or loaders="train" ) ] )
In config API need to use
_wrapper
argument:callbacks_params: ... loss: _wrapper: callback: ControlFlowCallback ignore_loaders: valid callback: CriterionCallback ...
-
__init__
(base_callback: catalyst.core.callback.Callback, epochs: Union[int, Sequence[int]] = None, ignore_epochs: Union[int, Sequence[int]] = None, loaders: Union[str, Sequence[str], Mapping[str, Union[int, Sequence[int]]]] = None, ignore_loaders: Union[str, Sequence[str], Mapping[str, Union[int, Sequence[int]]]] = None, filter_fn: Union[str, Callable[[str, int, str], bool]] = None, use_global_epochs: bool = False)[source]¶ - Parameters
base_callback (Callback) – callback to wrap
epochs (int/Sequence[int]) –
epochs where need to enable callback, on other epochs callback will be disabled.
If passed int/float then callback will be enabled with period specified as epochs value (epochs expression
epoch_number % epochs == 0
) and disabled on other epochs.If passed list of epochs then will be executed callback on specified epochs.
Default value is
None
.ignore_epochs –
(int/Sequence[int]): epochs where need to disable callback, on other epochs callback will be enabled.
If passed int/float then callback will be disabled with period specified as epochs value (epochs expression
epoch_number % epochs != 0
) and enabled on other epochs.If passed list of epochs then will be disabled callback on specified epochs.
Default value is
None
.loaders (str/Sequence[str]/Mapping[str, int/Sequence[str]]) –
loaders where should be enabled callback, on other loaders callback will be disabled.
If passed string object then will be disabled callback for loader with specified name.
If passed list/tuple of strings then will be disabled callback for loaders with specified names.
If passed dictionary where key is a string and values int or list of integers then callback will be disabled on epochs (dictionary value) for specified loader (dictionary key).
Default value is
None
.ignore_loaders (str/Sequence[str]/Mapping[str, int/Sequence[str]]) –
loader names where should be disabled callback, on other loaders callback will be enabled.
If passed string object then will be disabled callback for loader with specified name.
If passed list/tuple of strings then will be disabled callback for loaders with specified names.
If passed dictionary where key is a string and values int or list of integers then callback will be disabled on epochs (dictionary value) for specified loader (dictionary key).
Default value is
None
.filter_fn (str or Callable[[str, int, str], bool]) –
function to use instead of
loaders
orepochs
arguments.If the object passed to a
filter_fn
is a string then it will be interpreted as python code. Expected lambda function with three arguments stage name (str), epoch number (int), loader name (str) and this function should returnTrue
if callback should be enabled on some condition.If passed callable object then it should accept three arguments - stage name (str), epoch number (int), loader name (str) and should return
True
if callback should be enabled on some condition othervise should returnFalse
.Default value is
None
.Examples:
# enable callback on all loaders # exept "train" loader every 2 epochs ControlFlowCallback( ... filter_fn=lambda s, e, l: l != "train" and e % 2 == 0 ... ) # or with string equivalent ControlFlowCallback( ... filter_fn="lambda s, e, l: l != 'train' and e % 2 == 0" ... )
use_global_epochs (bool) – if
True
then will be used global epochs instead of epochs in a stage, the default value isFalse
-
Criterion¶
-
class
catalyst.core.callbacks.criterion.
CriterionCallback
(input_key: Union[str, List[str], Dict[str, str]] = 'targets', output_key: Union[str, List[str], Dict[str, str]] = 'logits', prefix: str = 'loss', criterion_key: str = None, multiplier: float = 1.0, **metric_kwargs)[source]¶ Bases:
catalyst.core.callbacks.metrics.IBatchMetricCallback
Callback for that measures loss with specified criterion.
-
__init__
(input_key: Union[str, List[str], Dict[str, str]] = 'targets', output_key: Union[str, List[str], Dict[str, str]] = 'logits', prefix: str = 'loss', criterion_key: str = None, multiplier: float = 1.0, **metric_kwargs)[source]¶ - Parameters
input_key (Union[str, List[str], Dict[str, str]]) – key/list/dict of keys that takes values from the input dictionary If ‘__all__’, the whole input will be passed to the criterion If None, empty dict will be passed to the criterion.
output_key (Union[str, List[str], Dict[str, str]]) – key/list/dict of keys that takes values from the input dictionary If ‘__all__’, the whole output will be passed to the criterion If None, empty dict will be passed to the criterion.
prefix (str) – prefix for metrics and output key for loss in
runner.batch_metrics
dictionarycriterion_key (str) – A key to take a criterion in case there are several of them and they are in a dictionary format.
multiplier (float) – scale factor for the output loss.
-
property
metric_fn
¶ Criterion function.
-
Early Stop¶
-
class
catalyst.core.callbacks.early_stop.
CheckRunCallback
(num_batch_steps: int = 3, num_epoch_steps: int = 2)[source]¶ Bases:
catalyst.core.callback.Callback
Executes only a pipeline part from the
Experiment
.-
__init__
(num_batch_steps: int = 3, num_epoch_steps: int = 2)[source]¶ - Parameters
num_batch_steps (int) – number of batches to iterate in epoch
num_epoch_steps (int) – number of epoch to perform in a stage
-
-
class
catalyst.core.callbacks.early_stop.
EarlyStoppingCallback
(patience: int, metric: str = 'loss', minimize: bool = True, min_delta: float = 1e-06)[source]¶ Bases:
catalyst.core.callback.Callback
Early exit based on metric.
Example of usage in notebook API:
runner = SupervisedRunner() runner.train( ... callbacks=[ ... EarlyStoppingCallback( patience=5, metric="my_metric", minimize=True, ) ... ] ) ...
Example of usage in config API:
stages: ... stage_N: ... callbacks_params: ... early_stopping: callback: EarlyStoppingCallback # arguments for EarlyStoppingCallback patience: 5 metric: my_metric minimize: true ...
-
__init__
(patience: int, metric: str = 'loss', minimize: bool = True, min_delta: float = 1e-06)[source]¶ - Parameters
patience (int) – number of epochs with no improvement after which training will be stopped.
metric (str) – metric name to use for early stopping, default is
"loss"
.minimize (bool) – if
True
then expected that metric should decrease and early stopping will be performed only when metric stops decreasing. IfFalse
then expected that metric should increase. Default valueTrue
.min_delta (float) – minimum change in the monitored metric to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement, default value is
1e-6
.
-
Logging¶
-
class
catalyst.core.callbacks.logging.
ILoggerCallback
(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]¶ Bases:
catalyst.core.callback.Callback
Logger callback interface, abstraction over logging step
-
class
catalyst.core.callbacks.logging.
ConsoleLogger
[source]¶ Bases:
catalyst.core.callbacks.logging.ILoggerCallback
Logger callback, translates
runner.*_metrics
to console and text file.
-
class
catalyst.core.callbacks.logging.
TensorboardLogger
(metric_names: List[str] = None, log_on_batch_end: bool = True, log_on_epoch_end: bool = True)[source]¶ Bases:
catalyst.core.callbacks.logging.ILoggerCallback
Logger callback, translates
runner.metric_manager
to tensorboard.-
__init__
(metric_names: List[str] = None, log_on_batch_end: bool = True, log_on_epoch_end: bool = True)[source]¶ - Parameters
metric_names (List[str]) – list of metric names to log, if none - logs everything
log_on_batch_end (bool) – logs per-batch metrics if set True
log_on_epoch_end (bool) – logs per-epoch metrics if set True
-
-
class
catalyst.core.callbacks.logging.
VerboseLogger
(always_show: List[str] = None, never_show: List[str] = None)[source]¶ Bases:
catalyst.core.callbacks.logging.ILoggerCallback
Logs the params into console.
-
__init__
(always_show: List[str] = None, never_show: List[str] = None)[source]¶ - Parameters
always_show (List[str]) – list of metrics to always show if None default is
["_timer/_fps"]
to remove always_show metrics set it to an empty list[]
never_show (List[str]) – list of metrics which will not be shown
-
Metrics¶
-
class
catalyst.core.callbacks.metrics.
IMetricCallback
(prefix: str, input_key: Union[str, List[str], Dict[str, str]] = 'targets', output_key: Union[str, List[str], Dict[str, str]] = 'logits', multiplier: float = 1.0, **metrics_kwargs)[source]¶ Bases:
abc.ABC
,catalyst.core.callback.Callback
@TODO: Docs. Contribution is welcome.
-
__init__
(prefix: str, input_key: Union[str, List[str], Dict[str, str]] = 'targets', output_key: Union[str, List[str], Dict[str, str]] = 'logits', multiplier: float = 1.0, **metrics_kwargs)[source]¶ @TODO: Docs. Contribution is welcome.
-
abstract property
metric_fn
¶ Docs. Contribution is welcome.
- Type
@TODO
-
-
class
catalyst.core.callbacks.metrics.
IBatchMetricCallback
(prefix: str, input_key: Union[str, List[str], Dict[str, str]] = 'targets', output_key: Union[str, List[str], Dict[str, str]] = 'logits', multiplier: float = 1.0, **metrics_kwargs)[source]¶ Bases:
catalyst.core.callbacks.metrics.IMetricCallback
@TODO: Docs. Contribution is welcome.
-
class
catalyst.core.callbacks.metrics.
ILoaderMetricCallback
(**kwargs)[source]¶ Bases:
catalyst.core.callbacks.metrics.IMetricCallback
@TODO: Docs. Contribution is welcome.
-
class
catalyst.core.callbacks.metrics.
BatchMetricCallback
(prefix: str, metric_fn: Callable, input_key: Union[str, List[str], Dict[str, str]] = 'targets', output_key: Union[str, List[str], Dict[str, str]] = 'logits', multiplier: float = 1.0, **metric_kwargs)[source]¶ Bases:
catalyst.core.callbacks.metrics.IBatchMetricCallback
A callback that returns single metric on runner.on_batch_end.
-
__init__
(prefix: str, metric_fn: Callable, input_key: Union[str, List[str], Dict[str, str]] = 'targets', output_key: Union[str, List[str], Dict[str, str]] = 'logits', multiplier: float = 1.0, **metric_kwargs)[source]¶ @TODO: Docs. Contribution is welcome.
-
property
metric_fn
¶ Docs. Contribution is welcome.
- Type
@TODO
-
-
class
catalyst.core.callbacks.metrics.
LoaderMetricCallback
(prefix: str, metric_fn: Callable, input_key: Union[str, List[str], Dict[str, str]] = 'targets', output_key: Union[str, List[str], Dict[str, str]] = 'logits', multiplier: float = 1.0, **metric_kwargs)[source]¶ Bases:
catalyst.core.callbacks.metrics.ILoaderMetricCallback
A callback that returns single metric on runner.on_batch_end.
-
__init__
(prefix: str, metric_fn: Callable, input_key: Union[str, List[str], Dict[str, str]] = 'targets', output_key: Union[str, List[str], Dict[str, str]] = 'logits', multiplier: float = 1.0, **metric_kwargs)[source]¶ @TODO: Docs. Contribution is welcome.
-
property
metric_fn
¶ Docs. Contribution is welcome.
- Type
@TODO
-
-
catalyst.core.callbacks.metrics.
MetricCallback
¶ alias of
catalyst.core.callbacks.metrics.BatchMetricCallback
-
class
catalyst.core.callbacks.metrics.
MetricAggregationCallback
(prefix: str, metrics: Union[str, List[str], Dict[str, float]] = None, mode: str = 'mean', scope: str = 'batch', multiplier: float = 1.0)[source]¶ Bases:
catalyst.core.callback.Callback
A callback to aggregate several metrics in one value.
-
__init__
(prefix: str, metrics: Union[str, List[str], Dict[str, float]] = None, mode: str = 'mean', scope: str = 'batch', multiplier: float = 1.0) → None[source]¶ - Parameters
prefix (str) – new key for aggregated metric.
metrics (Union[str, List[str], Dict[str, float]]) – If not None, it aggregates only the values from the metric by these keys. for
weighted_sum
aggregation it must be a Dict[str, float].mode (str) – function for aggregation. Must be either
sum
,mean
orweighted_sum
.multiplier (float) – scale factor for the aggregated metric.
-
-
class
catalyst.core.callbacks.metrics.
MetricManagerCallback
[source]¶ Bases:
catalyst.core.callback.Callback
Prepares metrics for logging, transferring values from PyTorch to numpy.
-
on_batch_end
(runner: catalyst.core.runner.IRunner) → None[source]¶ Batch end hook. :param runner: current runner :type runner: IRunner
-
on_batch_start
(runner: catalyst.core.runner.IRunner) → None[source]¶ Batch start hook. :param runner: current runner :type runner: IRunner
-
on_epoch_start
(runner: catalyst.core.runner.IRunner) → None[source]¶ Epoch start hook. :param runner: current runner :type runner: IRunner
-
on_loader_end
(runner: catalyst.core.runner.IRunner) → None[source]¶ Loader end hook. :param runner: current runner :type runner: IRunner
-
Optimizer¶
-
class
catalyst.core.callbacks.optimizer.
IOptimizerCallback
(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]¶ Bases:
catalyst.core.callback.Callback
Optimizer callback interface, abstraction over optimizer step.
-
class
catalyst.core.callbacks.optimizer.
AMPOptimizerCallback
(metric_key: str = None, optimizer_key: str = None, accumulation_steps: int = 1, grad_clip_params: Dict = None, loss_key: str = None)[source]¶ Bases:
catalyst.core.callbacks.optimizer.IOptimizerCallback
Optimizer callback with native torch amp support.
-
__init__
(metric_key: str = None, optimizer_key: str = None, accumulation_steps: int = 1, grad_clip_params: Dict = None, loss_key: str = None)[source]¶ - Parameters
loss_key (str) – key to get loss from
runner.batch_metrics
optimizer_key (str) – A key to take a optimizer in case there are several of them and they are in a dictionary format.
accumulation_steps (int) – number of steps before
model.zero_grad()
grad_clip_params (dict) – params for gradient clipping
decouple_weight_decay (bool) – If True - decouple weight decay regularization.
-
grad_step
(*, optimizer: torch.optim.optimizer.Optimizer, grad_clip_fn: Callable = None) → None[source]¶ Makes a gradient step for a given optimizer.
- Parameters
optimizer (Optimizer) – the optimizer
grad_clip_fn (Callable) – function for gradient clipping
-
on_batch_end
(runner: catalyst.core.runner.IRunner) → None[source]¶ On batch end event
- Parameters
runner (IRunner) – current runner
-
on_batch_start
(runner: catalyst.core.runner.IRunner) → None[source]¶ On batch start event
- Parameters
runner (IRunner) – current runner
-
on_epoch_end
(runner: catalyst.core.runner.IRunner) → None[source]¶ On epoch end event.
- Parameters
runner (IRunner) – current runner
-
-
class
catalyst.core.callbacks.optimizer.
OptimizerCallback
(metric_key: str = None, optimizer_key: str = None, accumulation_steps: int = 1, grad_clip_params: Dict = None, decouple_weight_decay: bool = True, loss_key: str = None, use_fast_zero_grad: bool = False, xla_barrier: bool = True)[source]¶ Bases:
catalyst.core.callbacks.optimizer.IOptimizerCallback
Optimizer callback, abstraction over optimizer step.
-
__init__
(metric_key: str = None, optimizer_key: str = None, accumulation_steps: int = 1, grad_clip_params: Dict = None, decouple_weight_decay: bool = True, loss_key: str = None, use_fast_zero_grad: bool = False, xla_barrier: bool = True)[source]¶ - Parameters
loss_key (str) – key to get loss from
runner.batch_metrics
optimizer_key (str) – A key to take a optimizer in case there are several of them and they are in a dictionary format.
accumulation_steps (int) – number of steps before
model.zero_grad()
grad_clip_params (dict) – params for gradient clipping
decouple_weight_decay (bool) – If
True
- decouple weight decay regularization.use_fast_zero_grad (bool) – boost
optiomizer.zero_grad()
, default isFalse
.xla_barrier (bool) –
barrier option for xla. Here you can find more about usage of barrier flag and examples.
Default is
True
.
-
grad_step
(*, optimizer: torch.optim.optimizer.Optimizer, optimizer_wds: List[float] = 0, grad_clip_fn: Callable = None) → None[source]¶ Makes a gradient step for a given optimizer.
- Parameters
optimizer (Optimizer) – the optimizer
optimizer_wds (List[float]) – list of weight decay parameters for each param group
grad_clip_fn (Callable) – function for gradient clipping
-
on_batch_end
(runner: catalyst.core.runner.IRunner) → None[source]¶ On batch end event
- Parameters
runner (IRunner) – current runner
-
on_epoch_end
(runner: catalyst.core.runner.IRunner) → None[source]¶ On epoch end event.
- Parameters
runner (IRunner) – current runner
-
PeriodicLoaderCallback¶
-
class
catalyst.core.callbacks.periodic_loader.
PeriodicLoaderCallback
(**kwargs)[source]¶ Bases:
catalyst.core.callback.Callback
Callback for runing loaders with specified period. To disable loader use
0
as period (if specified0
for validation loader then will be raised an error).For example, if you have
train
,train_additional
,valid
andvalid_additional
loaders and wan’t to usetrain_additional
every 2 epochs,valid
- every 3 epochs andvalid_additional
- every 5 epochs:from catalyst.dl import ( SupervisedRunner, PeriodicLoaderRunnerCallback, ) runner = SupervisedRunner() runner.train( ... loaders={ "train": ..., "train_additional": ..., "valid": ..., "valid_additional":... } ... callbacks=[ ... PeriodicLoaderRunnerCallback( train_additional=2, valid=3, valid_additional=5 ), ... ] ... )
-
on_epoch_end
(runner: catalyst.core.runner.IRunner) → None[source]¶ Check if validation metric should be dropped for current epoch.
- Parameters
runner (IRunner) – current runner
-
on_epoch_start
(runner: catalyst.core.runner.IRunner) → None[source]¶ Set loaders for current epoch. If validation is not required then the first loader from loaders used in current epoch will be used as validation loader. Metrics from the latest epoch with true validation loader will be used in the epochs where this loader is missing.
- Parameters
runner (IRunner) – current runner
- Raises
ValueError – if there are no loaders in epoch
-
Scheduler¶
-
class
catalyst.core.callbacks.scheduler.
ISchedulerCallback
(order: int, node: int = <CallbackNode.All: 0>, scope: int = <CallbackScope.Stage: 0>)[source]¶ Bases:
catalyst.core.callback.Callback
Scheduler callback interface, abstraction over scheduler step.
-
class
catalyst.core.callbacks.scheduler.
SchedulerCallback
(scheduler_key: str = None, mode: str = None, reduced_metric: str = None)[source]¶ Bases:
catalyst.core.callbacks.scheduler.ISchedulerCallback
Callback for wrapping schedulers.
Notebook API example:
import torch from torch.utils.data import DataLoader, TensorDataset from catalyst.dl import ( SupervisedRunner, AccuracyCallback, CriterionCallback, SchedulerCallback, ) num_samples, num_features = 10_000, 10 n_classes = 10 X = torch.rand(num_samples, num_features) y = torch.randint(0, n_classes, [num_samples]) loader = DataLoader(TensorDataset(X, y), batch_size=32, num_workers=1) loaders = {"train": loader, "valid": loader} model = torch.nn.Linear(num_features, n_classes) criterion = torch.nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters()) scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [3, 6]) runner = SupervisedRunner() runner.train( model=model, criterion=criterion, optimizer=optimizer, scheduler=scheduler, loaders=loaders, logdir="./logdir", num_epochs=5, verbose=False, main_metric="accuracy03", minimize_metric=False, callbacks=[ AccuracyCallback( accuracy_args=[1, 3, 5] ), SchedulerCallback(reduced_metric="loss") ] )
Config API usage example:
stages: ... scheduler_params: scheduler: MultiStepLR milestones: [1] gamma: 0.3 ... stage_N: ... callbacks_params: ... scheduler: callback: SchedulerCallback # arguments for SchedulerCallback reduced_metric: loss ...
-
__init__
(scheduler_key: str = None, mode: str = None, reduced_metric: str = None)[source]¶ - Parameters
scheduler_key (str) – scheduler name, if
None
, default isNone
.mode (str) – scheduler mode, should be one of
"epoch"
or"batch"
, default isNone
. IfNone
and object is instance ofBatchScheduler
orOneCycleLRWithWarmup
then will be used"batch"
otherwise -"epoch"
.reduced_metric (str) – metric name to forward to scheduler object, if
None
then will be used main metric specified in experiment.
-
on_batch_end
(runner: catalyst.core.runner.IRunner) → None[source]¶ Batch end hook.
- Parameters
runner (IRunner) – current runner
-
on_epoch_end
(runner: catalyst.core.runner.IRunner) → None[source]¶ Epoch end hook.
- Parameters
runner (IRunner) – current runner
-
on_loader_start
(runner: catalyst.core.runner.IRunner) → None[source]¶ Loader start hook.
- Parameters
runner (IRunner) – current runner
-
on_stage_start
(runner: catalyst.core.runner.IRunner) → None[source]¶ Stage start hook.
- Parameters
runner (IRunner) – current runner
-
-
class
catalyst.core.callbacks.scheduler.
LRUpdater
(optimizer_key: str = None)[source]¶ Bases:
abc.ABC
,catalyst.core.callback.Callback
Basic class that all Lr updaters inherit from.
-
__init__
(optimizer_key: str = None)[source]¶ - Parameters
optimizer_key (str) – which optimizer key to use for learning rate scheduling
-
on_batch_end
(runner: catalyst.core.runner.IRunner) → None[source]¶ Batch end hook.
- Parameters
runner (IRunner) – current runner
-
on_loader_start
(runner: catalyst.core.runner.IRunner) → None[source]¶ Loader start hook.
- Parameters
runner (IRunner) – current runner
-
Timer¶
-
class
catalyst.core.callbacks.timer.
TimerCallback
[source]¶ Bases:
catalyst.core.callback.Callback
Logs pipeline execution time.
-
on_batch_end
(runner: catalyst.core.runner.IRunner) → None[source]¶ Batch end hook.
- Parameters
runner (IRunner) – current runner
-
on_batch_start
(runner: catalyst.core.runner.IRunner) → None[source]¶ Batch start hook.
- Parameters
runner (IRunner) – current runner
-
Validation¶
-
class
catalyst.core.callbacks.validation.
ValidationManagerCallback
[source]¶ Bases:
catalyst.core.callback.Callback
A callback to aggregate runner.valid_metrics from runner.epoch_metrics.
Utils¶
-
catalyst.core.utils.callbacks.
sort_callbacks_by_order
(callbacks: Union[List, Dict, collections.OrderedDict]) → collections.OrderedDict[source]¶ Creates an sequence of callbacks and sort them.
- Parameters
callbacks – either list of callbacks or ordered dict
- Returns
sequence of callbacks sorted by
callback order
- Raises
TypeError – if callbacks is out of None, dict, OrderedDict, list
-
catalyst.core.utils.callbacks.
filter_callbacks_by_node
(callbacks: Union[Dict, collections.OrderedDict]) → Union[Dict, collections.OrderedDict][source]¶ Filters callbacks based on running node. Deletes worker-only callbacks from
CallbackNode.Master
and master-only callbacks fromCallbackNode.Worker
.- Parameters
callbacks (Union[Dict, OrderedDict]) – callbacks
- Returns
filtered callbacks dictionary.
- Return type
Union[Dict, OrderedDict]
Legacy¶
Runner¶
-
class
catalyst.core.legacy.
IRunnerLegacy
[source]¶ Bases:
object
Special class to encapsulate all catalyst.core.runner.IRunner and catalyst.core.runner.State legacy into one place. Used to make catalyst.core.runner.IRunner cleaner and easier to understand.
Saved for backward compatibility. Should be removed someday.
-
property
batch_in
¶ Alias for runner.input.
Warning
Deprecated, saved for backward compatibility. Please use runner.input instead.
-
property
batch_out
¶ Alias for runner.output.
Warning
Deprecated, saved for backward compatibility. Please use runner.output instead.
-
property
loader_step
¶ Alias for runner.loader_batch_step.
Warning
Deprecated, saved for backward compatibility. Please use runner.loader_batch_step instead.
-
property
need_backward_pass
¶ Alias for runner.is_train_loader.
Warning
Deprecated, saved for backward compatibility. Please use runner.is_train_loader instead.
-
property
state
¶ Alias for runner.
Warning
Deprecated, saved for backward compatibility. Please use runner instead.
-
property