Runners¶

Runner Extensions
- ISupervisedRunner
- ISelfSupervisedRunner
Python API
Config API
Hydra API
- HydraRunner
- SupervisedHydraRunner

Runner Extensions ¶

ISupervisedRunner ¶

class catalyst.runners.supervised.ISupervisedRunner(input_key: Any = 'features', output_key: Any = 'logits', target_key: str = 'targets', loss_key: str = 'loss')[source]¶

Bases: catalyst.core.runner.IRunner

IRunner for experiments with supervised model.

Parameters

input_key – key in runner.batch dict mapping for model input
output_key – key for runner.batch to store model output
target_key – key in runner.batch dict mapping for target
loss_key – key for runner.batch_metrics to store criterion loss output

Abstraction, please check out implementations for more details:

catalyst.runners.runner.SupervisedRunner

catalyst.runners.config.SupervisedConfigRunner

catalyst.runners.hydra.SupervisedHydraRunner

Note

ISupervisedRunner contains only the logic with batch handling.

ISupervisedRunner logic pseudocode:

batch = {"input_key": tensor, "target_key": tensor}
output = model(batch["input_key"])
batch["output_key"] = output
loss = criterion(batch["output_key"], batch["target_key"])
batch_metrics["loss_key"] = loss

Note

Please follow the minimal examples sections for use cases.

Examples:

import os
from torch import nn, optim
from torch.utils.data import DataLoader
from catalyst import dl, utils
from catalyst.data import ToTensor
from catalyst.contrib.datasets import MNIST

model = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10))
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(
        MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()),
        batch_size=32
    ),
    "valid": DataLoader(
        MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()),
        batch_size=32
    ),
}

runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
# model training
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    loaders=loaders,
    num_epochs=1,
    callbacks=[
        dl.AccuracyCallback(input_key="logits", target_key="targets", topk_args=(1, 3)),
        dl.PrecisionRecallF1SupportCallback(
            input_key="logits", target_key="targets", num_classes=10
        ),
        dl.AUCCallback(input_key="logits", target_key="targets"),
    ],
    logdir="./logs",
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
    verbose=True,
    load_best_on_end=True,
)
# model inference
for prediction in runner.predict_loader(loader=loaders["valid"]):
    assert prediction["logits"].detach().cpu().numpy().shape[-1] == 10

__init__(input_key: Any = 'features', output_key: Any = 'logits', target_key: str = 'targets', loss_key: str = 'loss')[source]¶: Init.

forward(batch: Mapping[str, Any], **kwargs) → Mapping[str, Any][source]¶

Forward method for your Runner. Should not be called directly outside of runner. If your model has specific interface, override this method to use it

Parameters

batch (Mapping[str, Any]) – dictionary with data batches from DataLoaders.
**kwargs – additional parameters to pass to the model

Returns

dict with model output batch

handle_batch(batch: Mapping[str, Any]) → None[source]¶

Inner method to handle specified data batch. Used to make a train/valid/infer stage during Experiment run.

Parameters: batch – dictionary with data batches from DataLoader.

class catalyst.runners.self_supervised.ISelfSupervisedRunner(input_key: str = 'features', target_key: str = 'target', loss_key: str = 'loss', augemention_prefix: str = 'augment', projection_prefix: str = 'projection', embedding_prefix: str = 'embedding')[source]¶

Bases: catalyst.core.runner.IRunner

IRunner for experiments with contrastive model.

Parameters

input_key – key in runner.batch dict mapping for model input
target_key – key in runner.batch dict mapping for target
loss_key – key for runner.batch_metrics to store criterion loss output
augemention_prefix – key for runner.batch to sample augumentions
projection_prefix – key for runner.batch to store model projection
embedding_prefix – key for runner.batch` to store model embeddings

Abstraction, please check out implementations for more details:

catalyst.runners.contrastive.ContrastiveRunner

Note

ISelfSupervisedRunner contains only the logic with batch handling.

ISelfSupervisedRunner logic pseudocode:

batch = {"aug1": tensor, "aug2": tensor, ...}
_, proj1 = model(batch["aug1"])
_, proj2 = model(batch["aug2"])
loss = criterion(proj1, proj2)
batch_metrics["loss_key"] = loss

Examples:

# 1. loader and transforms

transforms = Compose(
    [
        ToTensor(),
        Normalize((0.1307,), (0.3081,)),
        torchvision.transforms.RandomCrop((28, 28)),
        torchvision.transforms.RandomVerticalFlip(),
        torchvision.transforms.RandomHorizontalFlip(),
    ]
)
mnist = MNIST("./logdir", train=True, download=True, transform=None)
contrastive_mnist = ContrastiveDataset(mnist, transforms=transforms)

train_loader = torch.utils.data.DataLoader(contrastive_mnist, batch_size=BATCH_SIZE)

# 2. model and optimizer
encoder = MnistSimpleNet(out_features=16)
projection_head = nn.Sequential(
    nn.Linear(16, 16, bias=False), nn.ReLU(inplace=True), nn.Linear(16, 16, bias=True)
)

class ContrastiveModel(torch.nn.Module):
    def __init__(self, model, encoder):
        super(ContrastiveModel, self).__init__()
        self.model = model
        self.encoder = encoder

    def forward(self, x):
        emb = self.encoder(x)
        projection = self.model(emb)
        return emb, projection

model = ContrastiveModel(model=projection_head, encoder=encoder)

optimizer = Adam(model.parameters(), lr=LR)

# 3. criterion with triplets sampling
criterion = NTXentLoss(tau=0.1)

callbacks = [
    dl.ControlFlowCallback(
        dl.CriterionCallback(
            input_key="projection_left", target_key="projection_right", metric_key="loss"
        ),
        loaders="train",
    ),
    dl.SklearnModelCallback(
        feature_key="embedding_left",
        target_key="target",
        train_loader="train",
        valid_loaders="valid",
        model_fn=RandomForestClassifier,
        predict_method="predict_proba",
        predict_key="sklearn_predict",
        random_state=RANDOM_STATE,
        n_estimators=10,
    ),
    dl.ControlFlowCallback(
        dl.AccuracyCallback(
            target_key="target", input_key="sklearn_predict", topk_args=(1, 3)
        ),
        loaders="valid",
    ),
]

runner = dl.ContrastiveRunner()

logdir = "./logdir"
runner.train(
    model=model,
    engine=engine or dl.DeviceEngine(device),
    criterion=criterion,
    optimizer=optimizer,
    callbacks=callbacks,
    loaders={"train": train_loader, "valid": train_loader},
    verbose=True,
    logdir=logdir,
    valid_loader="train",
    valid_metric="loss",
    minimize_valid_metric=True,
    num_epochs=10,
)

Note

Please follow the minimal examples sections for use cases.

__init__(input_key: str = 'features', target_key: str = 'target', loss_key: str = 'loss', augemention_prefix: str = 'augment', projection_prefix: str = 'projection', embedding_prefix: str = 'embedding')[source]¶: Init.

forward(batch: Mapping[str, Any], **kwargs) → Mapping[str, Any][source]¶

Forward method for your Runner. Should not be called directly outside of runner. If your model has specific interface, override this method to use it

Parameters

batch (Mapping[str, Any]) – dictionary with data batches from DataLoaders.
**kwargs – additional parameters to pass to the model

Returns

dict with model output batch

handle_batch(batch: Mapping[str, Any]) → None[source]¶

Inner method to handle specified data batch. Used to make a train/valid/infer stage during Experiment run.

Parameters: batch – dictionary with data batches from DataLoader.

Python API ¶

Runner ¶

class catalyst.runners.runner.Runner(*args, **kwargs)[source]¶

Bases: catalyst.core.runner.IRunner

Single-stage deep learning Runner with user-friendly API.

Runner supports the logic for deep learning pipeline configuration with pure python code. Please check the examples for intuition.

Parameters

*args – IRunner args (model, engine)
**kwargs – IRunner kwargs (model, engine)

Note

IRunner supports only base user-friendly callbacks, like TqdmCallback, TimerCallback, CheckRunCallback, BatchOverfitCallback, and CheckpointCallback.

It does not automatically add Criterion, Optimizer or Scheduler callbacks.

That means, that you have do optimization step by yourself during handle_batch method or specify the required callbacks in .train or get_callbacks methods.

For more easy-to-go supervised use case please follow catalyst.runners.runner.SupervisedRunner.

Note

Please follow the minimal examples sections for use cases.

Examples:

import os
from torch import nn, optim
from torch.nn import functional as F
from torch.utils.data import DataLoader
from catalyst import dl, metrics
from catalyst.data import ToTensor
from catalyst.contrib.datasets import MNIST

model = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10))
optimizer = optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(
        MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()),
        batch_size=32
    ),
    "valid": DataLoader(
        MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()),
        batch_size=32
    ),
}

class CustomRunner(dl.Runner):
    def predict_batch(self, batch):
        # model inference step
        return self.model(batch[0].to(self.device))

    def on_loader_start(self, runner):
        super().on_loader_start(runner)
        self.meters = {
            key: metrics.AdditiveMetric(compute_on_call=False)
            for key in ["loss", "accuracy01", "accuracy03"]
        }

    def handle_batch(self, batch):
        # model train/valid step
        # unpack the batch
        x, y = batch
        # run model forward pass
        logits = self.model(x)
        # compute the loss
        loss = F.cross_entropy(logits, y)
        # compute other metrics of interest
        accuracy01, accuracy03 = metrics.accuracy(logits, y, topk=(1, 3))
        # log metrics
        self.batch_metrics.update(
            {"loss": loss, "accuracy01": accuracy01, "accuracy03": accuracy03}
        )
        for key in ["loss", "accuracy01", "accuracy03"]:
            self.meters[key].update(self.batch_metrics[key].item(), self.batch_size)
        # run model backward pass
        if self.is_train_loader:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()

    def on_loader_end(self, runner):
        for key in ["loss", "accuracy01", "accuracy03"]:
            self.loader_metrics[key] = self.meters[key].compute()[0]
        super().on_loader_end(runner)

runner = CustomRunner()
# model training
runner.train(
    model=model,
    optimizer=optimizer,
    loaders=loaders,
    logdir="./logs",
    num_epochs=5,
    verbose=True,
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
)
# model inference
for logits in runner.predict_loader(loader=loaders["valid"]):
    assert logits.detach().cpu().numpy().shape[-1] == 10

__init__(*args, **kwargs)[source]¶: Init.

evaluate_loader(loader: torch.utils.data.dataloader.DataLoader, callbacks: Union[List[Callback], OrderedDict[str, Callback]] = None, model: Optional[torch.nn.modules.module.Module] = None, seed: int = 42, verbose: bool = False) → Dict[str, Any][source]¶

Evaluates data from loader with given model and returns obtained metrics. # noqa: DAR401

Parameters

loader – loader to predict
callbacks – list or dictionary with catalyst callbacks
model – model, compatible with current runner. If None simply takes current model from runner.
seed – random seed to use before prediction
verbose – if True, it displays the status of the evaluation to the console.

Returns

Dict with metrics counted on the loader.

get_callbacks(stage: str) → OrderedDict[str, Callback][source]¶: Returns the callbacks for a given stage.

get_criterion(stage: str) → torch.nn.modules.module.Module[source]¶: Returns the criterion for a given stage.

get_engine() → catalyst.core.engine.IEngine[source]¶: Returns the engine for a run.

get_loaders(stage: str) → OrderedDict[str, DataLoader][source]¶: Returns the loaders for a given stage.

get_loggers() → Dict[str, catalyst.core.logger.ILogger][source]¶: Returns the logger for a run.

get_model(stage: str) → torch.nn.modules.module.Module[source]¶: Returns the model for a given stage.

get_optimizer(stage: str, model: torch.nn.modules.module.Module) → torch.optim.optimizer.Optimizer[source]¶: Returns the optimizer for a given stage.

get_scheduler(stage: str, optimizer: torch.optim.optimizer.Optimizer) → torch.optim.lr_scheduler._LRScheduler[source]¶: Returns the scheduler for a given stage.

get_stage_len(stage: str) → int[source]¶: Returns the stage length in epochs for a given stage.

get_trial() → catalyst.core.trial.ITrial[source]¶: Returns the trial for a run.

property hparams: Dict¶: Returns hyperparameters.

property name: str¶: Returns run name.

predict_batch(batch: Mapping[str, Any], **kwargs) → Mapping[str, Any][source]¶

Run model inference on specified data batch.

Parameters

batch – dictionary with data batches from DataLoader.
**kwargs – additional kwargs to pass to the model

Returns

model output dictionary

Return type

Mapping

Raises

NotImplementedError – if not implemented yet

predict_loader(*, loader: torch.utils.data.dataloader.DataLoader, model: torch.nn.modules.module.Module = None, engine: Union[IEngine, str] = None, seed: int = 42, fp16: bool = False, amp: bool = False, apex: bool = False, ddp: bool = False) → Generator[source]¶

Runs model inference on PyTorch DataLoader and returns python generator with model predictions from runner.predict_batch.

Parameters

loader – loader to predict
model – model to use for prediction
engine – engine to use for prediction
seed – random seed to use before prediction
fp16 – boolean flag to use half-precision training (AMP > APEX)
amp – boolean flag to use amp half-precision
apex – boolean flag to use apex half-precision
ddp – if True will start training in distributed mode. Note: Works only with python scripts. No jupyter support.

Yields

bathes with model predictions

Note

Please follow the minimal examples sections for use cases.

Examples:

import os
from torch import nn, optim
from torch.nn import functional as F
from torch.utils.data import DataLoader
from catalyst import dl, metrics
from catalyst.data import ToTensor
from catalyst.contrib.datasets import MNIST

model = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10))
optimizer = optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(
        MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()),
        batch_size=32
    ),
    "valid": DataLoader(
        MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()),
        batch_size=32
    ),
}

class CustomRunner(dl.Runner):
    def predict_batch(self, batch):
        # model inference step
        return self.model(batch[0].to(self.device))

    def on_loader_start(self, runner):
        super().on_loader_start(runner)
        self.meters = {
            key: metrics.AdditiveMetric(compute_on_call=False)
            for key in ["loss", "accuracy01", "accuracy03"]
        }

    def handle_batch(self, batch):
        # model train/valid step
        # unpack the batch
        x, y = batch
        # run model forward pass
        logits = self.model(x)
        # compute the loss
        loss = F.cross_entropy(logits, y)
        # compute other metrics of interest
        accuracy01, accuracy03 = metrics.accuracy(logits, y, topk=(1, 3))
        # log metrics
        self.batch_metrics.update(
            {"loss": loss, "accuracy01": accuracy01, "accuracy03": accuracy03}
        )
        for key in ["loss", "accuracy01", "accuracy03"]:
            self.meters[key].update(
                self.batch_metrics[key].item(),
                self.batch_size
            )
        # run model backward pass
        if self.is_train_loader:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()

    def on_loader_end(self, runner):
        for key in ["loss", "accuracy01", "accuracy03"]:
            self.loader_metrics[key] = self.meters[key].compute()[0]
        super().on_loader_end(runner)

runner = CustomRunner()
# model training
runner.train(
    model=model,
    optimizer=optimizer,
    loaders=loaders,
    logdir="./logs",
    num_epochs=5,
    verbose=True,
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
)
# model inference
for logits in runner.predict_loader(loader=loaders["valid"]):
    assert logits.detach().cpu().numpy().shape[-1] == 10

property seed: int¶: Experiment’s initial seed value.

property stages: Iterable[str]¶: Experiment’s stage names (array with one value).

train(*, loaders: OrderedDict[str, DataLoader], model: torch.nn.modules.module.Module, engine: Union[IEngine, str] = None, trial: catalyst.core.trial.ITrial = None, criterion: torch.nn.modules.module.Module = None, optimizer: torch.optim.optimizer.Optimizer = None, scheduler: torch.optim.lr_scheduler._LRScheduler = None, callbacks: Union[List[Callback], OrderedDict[str, Callback]] = None, loggers: Dict[str, ILogger] = None, seed: int = 42, hparams: Dict[str, Any] = None, num_epochs: int = 1, logdir: str = None, valid_loader: str = None, valid_metric: str = None, minimize_valid_metric: bool = True, verbose: bool = False, timeit: bool = False, check: bool = False, overfit: bool = False, load_best_on_end: bool = False, fp16: bool = False, amp: bool = False, apex: bool = False, ddp: bool = False) → None[source]¶

Starts the train stage of the model.

Parameters

loaders – dictionary with one or several torch.utils.data.DataLoader for training, validation or inference
model – model to train
engine – engine to use for model training
trial – trial to use during model training
criterion – criterion function for training
optimizer – optimizer for training
scheduler – scheduler for training
callbacks – list or dictionary with Catalyst callbacks
loggers – dictionary with Catalyst loggers
seed – experiment’s initial seed value
hparams – hyperparameters for the run
num_epochs – number of training epochs
logdir – path to output directory
valid_loader – loader name used to calculate the metrics and save the checkpoints. For example, you can pass train and then the metrics will be taken from train loader.
valid_metric – the key to the name of the metric by which the checkpoints will be selected.
minimize_valid_metric – flag to indicate whether the valid_metric should be minimized or not (default: True).
verbose – if True, it displays the status of the training to the console.
timeit – if True, computes the execution time of training process and displays it to the console.
check – if True, then only checks that pipeline is working (3 epochs only with 3 batches per loader)
overfit – if True, then takes only one batch per loader for model overfitting, for advance usage please check BatchOverfitCallback
load_best_on_end – if True, Runner will load best checkpoint state (model, optimizer, etc) according to validation metrics. Requires specified logdir.
fp16 – boolean flag to use half-precision training (AMP > APEX)
amp – boolean flag to use amp half-precision
apex – boolean flag to use apex half-precision
ddp – if True will start training in distributed mode. Note: Works only with python scripts. No jupyter support.

Note

Please follow the minimal examples sections for use cases.

Examples:

import os
from torch import nn, optim
from torch.nn import functional as F
from torch.utils.data import DataLoader
from catalyst import dl, metrics
from catalyst.data import ToTensor
from catalyst.contrib.datasets import MNIST

model = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10))
optimizer = optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(
        MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()),
        batch_size=32
    ),
    "valid": DataLoader(
        MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()),
        batch_size=32
    ),
}

class CustomRunner(dl.Runner):
    def predict_batch(self, batch):
        # model inference step
        return self.model(batch[0].to(self.device))

    def on_loader_start(self, runner):
        super().on_loader_start(runner)
        self.meters = {
            key: metrics.AdditiveMetric(compute_on_call=False)
            for key in ["loss", "accuracy01", "accuracy03"]
        }

    def handle_batch(self, batch):
        # model train/valid step
        # unpack the batch
        x, y = batch
        # run model forward pass
        logits = self.model(x)
        # compute the loss
        loss = F.cross_entropy(logits, y)
        # compute other metrics of interest
        accuracy01, accuracy03 = metrics.accuracy(logits, y, topk=(1, 3))
        # log metrics
        self.batch_metrics.update(
            {"loss": loss, "accuracy01": accuracy01, "accuracy03": accuracy03}
        )
        for key in ["loss", "accuracy01", "accuracy03"]:
            self.meters[key].update(
                self.batch_metrics[key].item(),
                self.batch_size
            )
        # run model backward pass
        if self.is_train_loader:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()

    def on_loader_end(self, runner):
        for key in ["loss", "accuracy01", "accuracy03"]:
            self.loader_metrics[key] = self.meters[key].compute()[0]
        super().on_loader_end(runner)

runner = CustomRunner()
# model training
runner.train(
    model=model,
    optimizer=optimizer,
    loaders=loaders,
    logdir="./logs",
    num_epochs=5,
    verbose=True,
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
)
# model inference
for logits in runner.predict_loader(loader=loaders["valid"]):
    assert logits.detach().cpu().numpy().shape[-1] == 10

SupervisedRunner ¶

class catalyst.runners.runner.SupervisedRunner(model: Optional[Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]]] = None, engine: Optional[catalyst.core.engine.IEngine] = None, input_key: Any = 'features', output_key: Any = 'logits', target_key: str = 'targets', loss_key: str = 'loss')[source]¶

Bases: catalyst.runners.supervised.ISupervisedRunner, catalyst.runners.runner.Runner

Runner for experiments with supervised model.

Parameters

model – Torch model instance
engine – IEngine instance
input_key – key in runner.batch dict mapping for model input
output_key – key for runner.batch to store model output
target_key – key in runner.batch dict mapping for target
loss_key – key for runner.batch_metrics to store criterion loss output

Note

Please follow the minimal examples sections for use cases.

Examples:

import os
from torch import nn, optim
from torch.utils.data import DataLoader
from catalyst import dl, utils
from catalyst.data import ToTensor
from catalyst.contrib.datasets import MNIST

model = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10))
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(
        MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()),
        batch_size=32
    ),
    "valid": DataLoader(
        MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()),
        batch_size=32
    ),
}

runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
# model training
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    loaders=loaders,
    num_epochs=1,
    callbacks=[
        dl.AccuracyCallback(input_key="logits", target_key="targets", topk_args=(1, 3)),
        dl.PrecisionRecallF1SupportCallback(
            input_key="logits", target_key="targets", num_classes=10
        ),
        dl.AUCCallback(input_key="logits", target_key="targets"),
    ],
    logdir="./logs",
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
    verbose=True,
    load_best_on_end=True,
)
# model inference
for prediction in runner.predict_loader(loader=loaders["valid"]):
    assert prediction["logits"].detach().cpu().numpy().shape[-1] == 10

__init__(model: Optional[Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]]] = None, engine: Optional[catalyst.core.engine.IEngine] = None, input_key: Any = 'features', output_key: Any = 'logits', target_key: str = 'targets', loss_key: str = 'loss')[source]¶: Init.

get_callbacks(stage: str) → OrderedDict[str, Callback][source]¶

Prepares the callbacks for selected stage.

Parameters: stage – stage name
Returns: dictionary with stage callbacks

predict_batch(batch: Mapping[str, Any], **kwargs) → Mapping[str, Any][source]¶

Run model inference on specified data batch.

Warning

You should not override this method. If you need specific model call, override forward() method

Parameters

batch – dictionary with data batch from DataLoader.
**kwargs – additional kwargs to pass to the model

Returns

model output dictionary

Return type

Mapping[str, Any]

SelfSupervisedRunner ¶

class catalyst.runners.runner.SelfSupervisedRunner(model: Optional[Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]]] = None, engine: Optional[catalyst.core.engine.IEngine] = None, input_key: str = 'features', target_key: str = 'target', loss_key: str = 'loss', augemention_prefix: str = 'augment', projection_prefix: str = 'projection', embedding_prefix: str = 'embedding', loss_mode_prefix: str = 'projection')[source]¶

Bases: catalyst.runners.self_supervised.ISelfSupervisedRunner, catalyst.runners.runner.Runner

Runner for experiments with contrastive model.

Parameters

input_key – key in runner.batch dict mapping for model input
target_key – key in runner.batch dict mapping for target
loss_key – key for runner.batch_metrics to store criterion loss output
augemention_prefix – key for runner.batch to sample augumentions
projection_prefix – key for runner.batch to store model projection
embedding_prefix – key for runner.batch` to store model embeddings
loss_mode_prefix – selector key for loss calculation

Examples:

# 1. loader and transforms

transforms = Compose(
    [
        ToTensor(),
        Normalize((0.1307,), (0.3081,)),
        torchvision.transforms.RandomCrop((28, 28)),
        torchvision.transforms.RandomVerticalFlip(),
        torchvision.transforms.RandomHorizontalFlip(),
    ]
)
mnist = MNIST("./logdir", train=True, download=True, transform=None)
contrastive_mnist = ContrastiveDataset(mnist, transforms=transforms)

train_loader = torch.utils.data.DataLoader(contrastive_mnist, batch_size=BATCH_SIZE)

# 2. model and optimizer
encoder = MnistSimpleNet(out_features=16)
projection_head = nn.Sequential(
    nn.Linear(16, 16, bias=False), nn.ReLU(inplace=True), nn.Linear(16, 16, bias=True)
)

class ContrastiveModel(torch.nn.Module):
    def __init__(self, model, encoder):
        super(ContrastiveModel, self).__init__()
        self.model = model
        self.encoder = encoder

    def forward(self, x):
        emb = self.encoder(x)
        projection = self.model(emb)
        return emb, projection

model = ContrastiveModel(model=projection_head, encoder=encoder)

optimizer = Adam(model.parameters(), lr=LR)

# 3. criterion with triplets sampling
criterion = NTXentLoss(tau=0.1)

callbacks = [
    dl.ControlFlowCallback(
        dl.CriterionCallback(
            input_key="projection_left", target_key="projection_right", metric_key="loss"
        ),
        loaders="train",
    ),
    dl.SklearnModelCallback(
        feature_key="embedding_left",
        target_key="target",
        train_loader="train",
        valid_loaders="valid",
        model_fn=RandomForestClassifier,
        predict_method="predict_proba",
        predict_key="sklearn_predict",
        random_state=RANDOM_STATE,
        n_estimators=10,
    ),
    dl.ControlFlowCallback(
        dl.AccuracyCallback(
            target_key="target", input_key="sklearn_predict", topk_args=(1, 3)
        ),
        loaders="valid",
    ),
]

runner = dl.ContrastiveRunner()

logdir = "./logdir"
runner.train(
    model=model,
    engine=engine or dl.DeviceEngine(device),
    criterion=criterion,
    optimizer=optimizer,
    callbacks=callbacks,
    loaders={"train": train_loader, "valid": train_loader},
    verbose=True,
    logdir=logdir,
    valid_loader="train",
    valid_metric="loss",
    minimize_valid_metric=True,
    num_epochs=10,
)

__init__(model: Optional[Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]]] = None, engine: Optional[catalyst.core.engine.IEngine] = None, input_key: str = 'features', target_key: str = 'target', loss_key: str = 'loss', augemention_prefix: str = 'augment', projection_prefix: str = 'projection', embedding_prefix: str = 'embedding', loss_mode_prefix: str = 'projection')[source]¶: Init.

get_callbacks(stage: str) → OrderedDict[str, Callback][source]¶

Prepares the callbacks for selected stage.

Parameters: stage – stage name
Returns: dictionary with stage callbacks

predict_batch(batch: Mapping[str, Any], **kwargs) → Mapping[str, Any][source]¶

Run model inference on specified data batch.

Warning

You should not override this method. If you need specific model call, override forward() method

Parameters

batch – dictionary with data batch from DataLoader.
**kwargs – additional kwargs to pass to the model

Returns

model output dictionary

Return type

Mapping[str, Any]

Config API ¶

ConfigRunner ¶

class catalyst.runners.config.ConfigRunner(config: Dict)[source]¶

Bases: catalyst.core.runner.IRunner

Runner created from a dictionary configuration file. Used for Catalyst Config API.

Parameters: config – dictionary with parameters

Note

Please follow the minimal examples sections for use cases.

Examples:

dataset = SomeDataset()
runner = SupervisedConfigRunner(
    config={
        "args": {"logdir": logdir},
        "model": {"_target_": "SomeModel", "in_features": 4, "out_features": 2},
        "engine": {"_target_": "DeviceEngine", "device": device},
        "stages": {
            "stage1": {
                "num_epochs": 10,
                "criterion": {"_target_": "MSELoss"},
                "optimizer": {"_target_": "Adam", "lr": 1e-3},
                "loaders": {"batch_size": 4, "num_workers": 0},
                "callbacks": {
                    "criterion": {
                        "_target_": "CriterionCallback",
                        "metric_key": "loss",
                        "input_key": "logits",
                        "target_key": "targets",
                    },
                    "optimizer": {
                        "_target_": "OptimizerCallback",
                        "metric_key": "loss"
                    },
                },
            },
        },
    }
)
runner.get_datasets = lambda *args, **kwargs: {
    "train": dataset,
    "valid": dataset,
}
runner.run()

__init__(config: Dict)[source]¶: Init.

get_callbacks(stage: str) → OrderedDict[str, Callback][source]¶: Returns the callbacks for a given stage.

get_criterion(stage: str) → Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]][source]¶: Returns the criterion for a given stage.

get_datasets(stage: str) → OrderedDict[str, Dataset][source]¶

Returns datasets for a given stage.

Parameters: stage – stage name
Returns: datasets objects
Return type: Dict

get_engine() → catalyst.core.engine.IEngine[source]¶: Returns the engine for the run.

get_loaders(stage: str) → OrderedDict[str, DataLoader][source]¶

Returns loaders for a given stage.

Parameters: stage – stage name
Returns: loaders objects
Return type: Dict

get_loggers() → Dict[str, catalyst.core.logger.ILogger][source]¶: Returns the loggers for the run.

get_model(stage: str) → Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]][source]¶: Returns the model for a given stage.

get_optimizer(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]], stage: str) → Union[torch.optim.optimizer.Optimizer, Dict[str, torch.optim.optimizer.Optimizer]][source]¶

Returns the optimizer for a given stage and epoch.

Parameters

model – model or a dict of models
stage – current stage name

Returns

optimizer for selected stage and epoch

get_samplers(stage: str) → OrderedDict[str, Sampler][source]¶

Returns samplers for a given stage.

Parameters: stage – stage name
Returns: Dict of samplers

get_scheduler(optimizer: Union[torch.optim.optimizer.Optimizer, Dict[str, torch.optim.optimizer.Optimizer]], stage: str) → Union[torch.optim.lr_scheduler._LRScheduler, Dict[str, torch.optim.lr_scheduler._LRScheduler]][source]¶: Returns the scheduler for a given stage.

get_stage_len(stage: str) → int[source]¶

Returns number of epochs for the selected stage.

Parameters: stage – current stage
Returns: number of epochs in stage

Example:

>>> runner.get_stage_len("pretraining")
3

get_trial() → catalyst.core.trial.ITrial[source]¶: Returns the trial for the run.

property hparams: Dict¶: Returns hyper parameters

property logdir: str¶: Experiment’s logdir for artefacts and logging.

property name: str¶: Returns run name for monitoring tools.

property seed: int¶: Experiment’s seed for reproducibility.

property stages: List[str]¶: Experiment’s stage names.

SupervisedConfigRunner ¶

class catalyst.runners.config.SupervisedConfigRunner(config: Optional[Dict] = None, input_key: Any = 'features', output_key: Any = 'logits', target_key: str = 'targets', loss_key: str = 'loss')[source]¶

Bases: catalyst.runners.supervised.ISupervisedRunner, catalyst.runners.config.ConfigRunner

ConfigRunner for supervised tasks

Parameters

config – dictionary with parameters
input_key – key in runner.batch dict mapping for model input
output_key – key for runner.batch to store model output
target_key – key in runner.batch dict mapping for target
loss_key – key for runner.batch_metrics to store criterion loss output

Note

Please follow the minimal examples sections for use cases.

Examples:

dataset = SomeDataset()
runner = SupervisedConfigRunner(
    config={
        "args": {"logdir": logdir},
        "model": {"_target_": "SomeModel", "in_features": 4, "out_features": 2},
        "engine": {"_target_": "DeviceEngine", "device": device},
        "stages": {
            "stage1": {
                "num_epochs": 10,
                "criterion": {"_target_": "MSELoss"},
                "optimizer": {"_target_": "Adam", "lr": 1e-3},
                "loaders": {
                    "batch_size": 4,
                    "num_workers": 0,
                    "datasets": {
                        "train": {
                            "_target_": "SelfSupervisedDatasetWrapper",
                            "dataset": dataset
                        },
                        "transforms": ...,
                        "transform_original": ...,
                    },
                },
                "callbacks": {
                    "criterion": {
                        "_target_": "CriterionCallback",
                        "metric_key": "loss",
                        "input_key": "logits",
                        "target_key": "targets",
                    },
                    "optimizer": {
                        "_target_": "OptimizerCallback",
                        "metric_key": "loss"
                    },
                },
            },
        },
    }
)
runner.run()

__init__(config: Optional[Dict] = None, input_key: Any = 'features', output_key: Any = 'logits', target_key: str = 'targets', loss_key: str = 'loss')[source]¶: Init.

SelfSupervisedConfigRunner ¶

class catalyst.runners.config.SelfSupervisedConfigRunner(config: Optional[Dict] = None, input_key: str = 'features', target_key: str = 'target', loss_key: str = 'loss', augemention_prefix: str = 'augment', projection_prefix: str = 'projection', embedding_prefix: str = 'embedding')[source]¶

Bases: catalyst.runners.self_supervised.ISelfSupervisedRunner, catalyst.runners.config.ConfigRunner

ConfigRunner for contrastive tasks

Parameters

config – dictionary with parameters
input_key – key in runner.batch dict mapping for model input
target_key – key in runner.batch dict mapping for target
loss_key – key for runner.batch_metrics to store criterion loss output
augemention_prefix – key for runner.batch to sample augumentions
projection_prefix – key for runner.batch to store model projection
embedding_prefix – key for runner.batch` to store model embeddings

Note

Please follow the minimal examples sections for use cases.

Examples:

dataset = SomeDataset()
runner = SupervisedConfigRunner(
    config={
        "args": {"logdir": logdir},
        "model": {"_target_": "SomeContrastiveModel", ...},
        "engine": {"_target_": "DeviceEngine", "device": device},
        "stages": {
            "stage1": {
                "num_epochs": 10,
                "criterion": {"_target_": "NTXentLoss", "tau": 0.1},
                "optimizer": {"_target_": "Adam", "lr": 1e-3},
                "loaders": {"batch_size": 4, "num_workers": 0},
                "callbacks": {
                    "criterion": {
                        "_target_": "CriterionCallback",
                        "metric_key": "loss",
                        "input_key": "logits",
                        "target_key": "targets",
                    },
                    "optimizer": {
                        "_target_": "OptimizerCallback",
                        "metric_key": "loss"
                    },
                },
            },
        },
    }
)
runner.get_datasets = lambda *args, **kwargs: {
    "train": dataset,
    "valid": dataset,
}
runner.run()

__init__(config: Optional[Dict] = None, input_key: str = 'features', target_key: str = 'target', loss_key: str = 'loss', augemention_prefix: str = 'augment', projection_prefix: str = 'projection', embedding_prefix: str = 'embedding')[source]¶: Init.

Hydra API ¶

HydraRunner ¶

class catalyst.runners.hydra.HydraRunner(cfg: omegaconf.dictconfig.DictConfig)[source]¶

Bases: catalyst.core.runner.IRunner

Runner created from a hydra configuration file.

Parameters: cfg – Hydra dictionary with parameters

Note

Please follow the minimal examples sections for use cases.

__init__(cfg: omegaconf.dictconfig.DictConfig)[source]¶: Init.

get_callbacks(stage: str) → OrderedDict[str, Callback][source]¶: Returns the callbacks for a given stage.

get_criterion(stage: str) → Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]][source]¶: Returns the criterion for a given stage.

get_datasets(stage: str) → OrderedDict[str, Dataset][source]¶

Returns datasets for a given stage.

Parameters: stage – stage name
Returns: datasets objects
Return type: Dict

get_engine() → catalyst.core.engine.IEngine[source]¶: Returns the engine for the run.

get_loaders(stage: str) → Dict[str, torch.utils.data.dataloader.DataLoader][source]¶

Returns loaders for a given stage.

Parameters: stage – stage name
Returns: loaders objects
Return type: Dict

get_loggers() → Dict[str, catalyst.core.logger.ILogger][source]¶: Returns the loggers for the run.

get_model(stage: str) → Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]][source]¶: Returns the model for a given stage.

get_optimizer(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]], stage: str) → Union[torch.optim.optimizer.Optimizer, Dict[str, torch.optim.optimizer.Optimizer]][source]¶

Returns the optimizer for a given stage and epoch.

Parameters

model – model or a dict of models
stage – current stage name

Returns

optimizer for selected stage and epoch

get_samplers(stage: str) → OrderedDict[str, Sampler][source]¶

Returns samplers for a given stage.

Parameters: stage – stage name
Returns: Dict of samplers

get_scheduler(optimizer: Union[torch.optim.optimizer.Optimizer, Dict[str, torch.optim.optimizer.Optimizer]], stage: str) → Union[torch.optim.lr_scheduler._LRScheduler, Dict[str, torch.optim.lr_scheduler._LRScheduler]][source]¶: Returns the schedulers for a given stage.

get_stage_len(stage: str) → int[source]¶

Returns number of epochs for the selected stage.

Parameters: stage – current stage
Returns: number of epochs in stage

Example:

>>> runner.get_stage_len("pretraining")
3

get_transform(params: omegaconf.dictconfig.DictConfig) → Callable[source]¶

Returns the data transforms for a dataset.

Parameters: params – parameters of the transformation
Returns: Data transformations to use

get_trial() → catalyst.core.trial.ITrial[source]¶: Returns the trial for the run.

property hparams: collections.OrderedDict¶: Hyperparameters

property logdir: str¶: Experiment’s logdir for artefacts and logging.

property name: str¶: Returns run name for monitoring tools.

property seed: int¶: Experiment’s seed for reproducibility.

property stages: List[str]¶: Experiment’s stage names.

SupervisedHydraRunner ¶

class catalyst.runners.hydra.SupervisedHydraRunner(cfg: Optional[omegaconf.dictconfig.DictConfig] = None, input_key: Any = 'features', output_key: Any = 'logits', target_key: str = 'targets', loss_key: str = 'loss')[source]¶

Bases: catalyst.runners.supervised.ISupervisedRunner, catalyst.runners.hydra.HydraRunner

HydraRunner for supervised tasks

Parameters

cfg – Hydra dictionary with parameters
input_key – key in runner.batch dict mapping for model input
output_key – key for runner.batch to store model output
target_key – key in runner.batch dict mapping for target
loss_key – key for runner.batch_metrics to store criterion loss output

Note

Please follow the minimal examples sections for use cases.

__init__(cfg: Optional[omegaconf.dictconfig.DictConfig] = None, input_key: Any = 'features', output_key: Any = 'logits', target_key: str = 'targets', loss_key: str = 'loss')[source]¶: Init.