Catalyst¶

PyTorch framework for Deep Learning research and development. It focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new rather than write another regular train loop.

Break the cycle - use the Catalyst!

Project manifest. Part of PyTorch Ecosystem. Part of Catalyst Ecosystem:

Alchemy - experiments logging & visualization
Catalyst - accelerated deep learning R&D
Reaction - convenient deep learning models serving

Catalyst at AI Landscape.

Getting started¶

import os
import torch
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
from catalyst import dl
from catalyst.utils import metrics

model = torch.nn.Linear(28 * 28, 10)
optimizer = torch.optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()), batch_size=32),
    "valid": DataLoader(MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()), batch_size=32),
}

class CustomRunner(dl.Runner):

    def predict_batch(self, batch):
        # model inference step
        return self.model(batch[0].to(self.device).view(batch[0].size(0), -1))

    def _handle_batch(self, batch):
        # model train/valid step
        x, y = batch
        y_hat = self.model(x.view(x.size(0), -1))

        loss = F.cross_entropy(y_hat, y)
        accuracy01, accuracy03 = metrics.accuracy(y_hat, y, topk=(1, 3))
        self.batch_metrics.update(
            {"loss": loss, "accuracy01": accuracy01, "accuracy03": accuracy03}
        )

        if self.is_train_loader:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()

runner = CustomRunner()
# model training
runner.train(
    model=model,
    optimizer=optimizer,
    loaders=loaders,
    logdir="./logs",
    num_epochs=5,
    verbose=True,
    load_best_on_end=True,
)
# model inference
for prediction in runner.predict_loader(loader=loaders["valid"]):
    assert prediction.detach().cpu().numpy().shape[-1] == 10
# model tracing
traced_model = runner.trace(loader=loaders["valid"])

Step by step guide¶

Start with Catalyst 101 — Accelerated PyTorch introduction.
Check minimal examples.
Try notebook tutorials with Google Colab.
Read blogposts with use-cases and guides (and Config API intro).
Go through advanced classification, detection and segmentation pipelines with Config API. More pipelines available under projects section.
Want more? See Alchemy and Reaction packages.
For Catalyst.RL introduction, please follow Catalyst.RL repo.

Overview¶

Catalyst helps you write compact but full-featured Deep Learning pipelines in a few lines of code. You get a training loop with metrics, early-stopping, model checkpointing and other features without the boilerplate.

Installation¶

Common installation:

pip install -U catalyst

More specific with additional requirements:

pip install catalyst[cv]         # installs CV-based catalyst
pip install catalyst[nlp]        # installs NLP-based catalyst
pip install catalyst[ecosystem]  # installs Catalyst.Ecosystem
# and master version installation
pip install git+https://github.com/catalyst-team/catalyst@master --upgrade

Catalyst is compatible with: Python 3.6+. PyTorch 1.1+.

Tested on Ubuntu 16.04/18.04/20.04, macOS 10.15, Windows 10 and Windows Subsystem for Linux.

Structure¶

core - framework core with main abstractions - Experiment, Runner and Callback.
data - useful tools and scripts for data processing.
dl – runner for training and inference, all of the classic ML and CV/NLP/RecSys metrics and a variety of callbacks for training, validation and inference of neural networks.
tools - extra tools for Deep Learning research, class-based helpers.
utils - typical utils for Deep Learning research, function-based helpers.
contrib - additional modules contributed by Catalyst users.

Tests¶

All Catalyst code, features and pipelines are fully tested with our own catalyst-codestyle.

In fact, we train a number of different models for various of tasks - image classification, image segmentation, text classification, GANs training and much more. During the tests, we compare their convergence metrics in order to verify the correctness of the training procedure and its reproducibility.

As a result, Catalyst provides fully tested and reproducible best practices for your deep learning research.

Indices and tables¶

API

Core
- Core
- Callbacks
- Registry
- Utils
- Legacy
  - Runner
DL
- Experiment
- Runner
  - Runner
  - SupervisedRunner
- Callbacks
- Metrics
- Utils
  - Torch
  - Trace
  - Wizard
- Registry
Registry
- Registry
- Registries
Data
Tools and Utilities
- Tools
- Meters
- Utils
  - Checkpoint
  - Components
  - Config
  - Distributed
  - Hash
  - Initialization
  - Misc
  - Numpy
  - Parser
  - Scripts
  - Seed
  - Sys
  - Torch
- Metrics
  - Accuracy
  - AUC
  - CMC score
  - Dice
  - F1 score
  - Focal
  - IoU
  - Precision
  - Functional
Contrib
- Datasets
  - MNIST
  - Computer Vision
- DL
  - Callbacks
- Models
- NN
  - Criterion
    - Cross entropy
    - Contrastive
    - Circle
    - Dice
    - Focal
    - GAN
    - Huber
    - IOU
    - Lovasz
    - Margin
    - Triplet
    - Wing
  - Modules
  - Optimizers
    - Lamb
    - Lookahead
    - QHAdamW
    - RAdam
    - Ralamb
  - Schedulers
    - OneCycleLRWithWarmup
- Models
  - Segmentation
    - Unet
    - Linknet
    - FPNnet
    - PSPnet
- Registry
- Tools
  - Tensorboard
- Utilities
  - Argparse
  - Compression
  - Confusion Matrix
  - Dataset
  - Misc
  - Pandas
  - Parallel
  - Plotly
  - Serialization
  - Visualization
- Computer Vision utilities
  - Image
  - Tensor
- Natural Language Processing utilities
  - Text