Catalyst¶
PyTorch framework for Deep Learning R&D.¶
It focuses on reproducibility, rapid experimentation, and codebase reuse so you can create something new rather than write yet another train loop. Break the cycle - use the Catalyst!
Read more about our vision in the Project Manifest. Catalyst is a part of the PyTorch Ecosystem.
- Catalyst Ecosystem consists of:
Getting started¶
import os
from torch import nn, optim
from torch.utils.data import DataLoader
from catalyst import dl, utils
from catalyst.data.transforms import ToTensor
from catalyst.contrib.datasets import MNIST
model = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10))
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.02)
loaders = {
"train": DataLoader(MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()), batch_size=32),
"valid": DataLoader(MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()), batch_size=32),
}
runner = dl.SupervisedRunner(input_key="features", output_key="logits", target_key="targets", loss_key="loss")
# model training
runner.train(
model=model,
criterion=criterion,
optimizer=optimizer,
loaders=loaders,
num_epochs=1,
callbacks=[
dl.AccuracyCallback(input_key="logits", target_key="targets", topk_args=(1, 3, 5)),
# catalyst[ml] required
dl.ConfusionMatrixCallback(input_key="logits", target_key="targets", num_classes=10),
],
logdir="./logs",
valid_loader="valid",
valid_metric="loss",
minimize_valid_metric=True,
verbose=True,
load_best_on_end=True,
)
# model inference
for prediction in runner.predict_loader(loader=loaders["valid"]):
assert prediction["logits"].detach().cpu().numpy().shape[-1] == 10
features_batch = next(iter(loaders["valid"]))[0]
# model stochastic weight averaging
model.load_state_dict(utils.get_averaged_weights_by_path_mask(logdir="./logs", path_mask="*.pth"))
# model tracing
utils.trace_model(model=runner.model, batch=features_batch)
# model quantization
utils.quantize_model(model=runner.model)
# model pruning
utils.prune_model(model=runner.model, pruning_fn="l1_unstructured", amount=0.8)
# onnx export
utils.onnx_export(model=runner.model, batch=features_batch, file="./logs/mnist.onnx", verbose=True)
Step by step guide¶
Start with Catalyst 2021–Accelerated PyTorch 2.0 introduction.
Check minimal examples.
Read blogposts with use-cases and guides.
Learn machine learning with our “Deep Learning with Catalyst” course.
If you would like to contribute to the project, follow our contribution guidelines.
If you want to support the project, feel free to donate on patreon page or write us with your proposals.
And do not forget to join our slack for collaboration.
Overview¶
Catalyst helps you write compact but full-featured Deep Learning pipelines in a few lines of code. You get a training loop with metrics, early-stopping, model checkpointing and other features without the boilerplate.
Installation¶
Common installation:
pip install -U catalyst
More specific with additional requirements:
pip install catalyst[ml] # installs ML-based Catalyst
pip install catalyst[cv] # installs CV-based Catalyst
# master version installation
pip install git+https://github.com/catalyst-team/catalyst@master --upgrade
Catalyst is compatible with: Python 3.6+. PyTorch 1.3+.
Tested on Ubuntu 16.04/18.04/20.04, macOS 10.15, Windows 10 and Windows Subsystem for Linux.
Features¶
Universal train/inference loop.
Configuration files for model/data hyperparameters.
Reproducibility – all source code and environment variables will be saved.
Callbacks – reusable train/inference pipeline parts with easy customization.
Training stages support.
Deep Learning best practices - SWA, AdamW, Ranger optimizer, OneCycle, and more.
Developments best practices - fp16 support, distributed training, slurm support.
Tests¶
All Catalyst code, features and pipelines are fully tested with our own catalyst-codestyle.
In fact, we train a number of different models for various of tasks - image classification, image segmentation, text classification, GANs training and much more. During the tests, we compare their convergence metrics in order to verify the correctness of the training procedure and its reproducibility.
As a result, Catalyst provides fully tested and reproducible best practices for your deep learning research.
Indices and tables¶
- Callbacks
- Run
- BatchOverfitCallback
- BatchTransformCallback
- CheckpointCallback
- ControlFlowCallback
- CriterionCallback
- Metric – BatchMetricCallback
- Metric – LoaderMetricCallback
- Metric – MetricAggregationCallback
- Misc – CheckRunCallback
- Misc – EarlyStoppingCallback
- Misc – TimerCallback
- Misc – TqdmCallback
- OnnxCallback
- OptimizerCallback
- OptunaPruningCallback
- PeriodicLoaderCallback
- PruningCallback
- QuantizationCallback
- Scheduler – SchedulerCallback
- Scheduler – LRFinder
- Tracing
- Metric
- Accuracy - AccuracyCallback
- Accuracy - MultilabelAccuracyCallback
- AUCCallback
- Classification – PrecisionRecallF1SupportCallback
- Classification – MultilabelPrecisionRecallF1SupportCallback
- CMCScoreCallback
- ReidCMCScoreCallback
- ConfusionMatrixCallback
- FunctionalMetricCallback
- RecSys – HitrateCallback
- RecSys – MAPCallback
- RecSys – MRRCallback
- RecSys – NDCGCallback
- Segmentation – DiceCallback
- Segmentation – IOUCallback
- Segmentation – TrevskyCallback
- Run
- Contrib
- Core
- Data
- Engines
- Loggers
- Metrics
- Metric API
- General Metrics
- Runner Metrics
- Accuracy - AccuracyMetric
- Accuracy - MultilabelAccuracyMetric
- AUCMetric
- Classification – BinaryPrecisionRecallF1Metric
- Classification – MulticlassPrecisionRecallF1SupportMetric
- Classification – MultilabelPrecisionRecallF1SupportMetric
- CMCMetric
- ReidCMCMetric
- RecSys – HitrateMetric
- RecSys – MAPMetric
- RecSys – MRRMetric
- RecSys – NDCGMetric
- Segmentation – RegionBasedMetric
- Segmentation – DiceMetric
- Segmentation – IOUMetric
- Segmentation – TrevskyMetric
- Functional API
- Runners
- Tools
- Utilities