Shortcuts

Contrib

Datasets

MNIST

class catalyst.contrib.datasets.mnist.MNIST(root, train=True, transform=None, target_transform=None, download=False)[source]

Bases: torch.utils.data.dataset.Dataset

MNIST Dataset.

__init__(root, train=True, transform=None, target_transform=None, download=False)[source]
Parameters
  • root – Root directory of dataset where MNIST/processed/training.pt and MNIST/processed/test.pt exist.

  • train (bool, optional) – If True, creates dataset from training.pt, otherwise from test.pt.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • transform (callable, optional) – A function/transform that takes in an image and returns a transformed version.

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

MovieLens

class catalyst.contrib.datasets.movielens.MovieLens(root, train=True, download=False, min_rating=0.0)[source]

Bases: torch.utils.data.dataset.Dataset

MovieLens data sets were collected by the GroupLens Research Project at the University of Minnesota.

This data set consists of: * 100,000 ratings (1-5) from 943 users on 1682 movies. * Each user has rated at least 20 movies. * Simple demographic info for the users (age, gender, occupation, zip)

The data was collected through the MovieLens web site (movielens.umn.edu) during the seven-month period from September 19th, 1997 through April 22nd, 1998. This data has been cleaned up - users who had less than 20 ratings or did not have complete demographic information were removed from this data set. Detailed descriptions of the data file can be found at the end of this file.

Neither the University of Minnesota nor any of the researchers involved can guarantee the correctness of the data, its suitability for any particular purpose, or the validity of results based on the use of the data set. The data set may be used for any research purposes under the following conditions: * The user may not state or imply any endorsement from the University of Minnesota or the GroupLens Research Group. * The user must acknowledge the use of the data set in publications resulting from the use of the data set (see below for citation information). * The user may not redistribute the data without separate permission. * The user may not use this information for any commercial or revenue-bearing purposes without first obtaining permission from a faculty member of the GroupLens Research Project at the University of Minnesota.

If you have any further questions or comments, please contact GroupLens <grouplens-info@cs.umn.edu>. http://files.grouplens.org/datasets/movielens/ml-100k-README.txt

__init__(root, train=True, download=False, min_rating=0.0)[source]
Parameters
  • root (string) – Root directory of dataset where MovieLens/processed/training.pt and MovieLens/processed/test.pt exist.

  • train (bool, optional) – If True, creates dataset from training.pt, otherwise from test.pt.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • min_rating (float, optional) – Minimum rating to include in the interaction matrix

Computer Vision

Imagenette

class catalyst.contrib.datasets.cv.imagenette.Imagenette(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.cv.misc.ImageClassificationDataset

Imagenette Dataset.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs

Imagenette160

class catalyst.contrib.datasets.cv.imagenette.Imagenette160(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.cv.misc.ImageClassificationDataset

Imagenette Dataset with images resized so that the shortest size is 160 px.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs

Imagenette320

class catalyst.contrib.datasets.cv.imagenette.Imagenette320(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.cv.misc.ImageClassificationDataset

Imagenette Dataset with images resized so that the shortest size is 320 px.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs

Imagewang

class catalyst.contrib.datasets.cv.imagewang.Imagewang(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.cv.misc.ImageClassificationDataset

Imagewang Dataset.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs

Imagewang160

class catalyst.contrib.datasets.cv.imagewang.Imagewang160(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.cv.misc.ImageClassificationDataset

Imagewang Dataset with images resized so that the shortest size is 160 px.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs

Imagewang320

class catalyst.contrib.datasets.cv.imagewang.Imagewang320(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.cv.misc.ImageClassificationDataset

Imagewang Dataset with images resized so that the shortest size is 320 px.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs

Imagewoof

catalyst.contrib.datasets.cv.imagewoof

alias of catalyst.contrib.datasets.cv.imagewoof

Imagewoof160

class catalyst.contrib.datasets.cv.Imagewoof160(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.cv.misc.ImageClassificationDataset

Imagewoof Dataset with images resized so that the shortest size is 160 px.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs

Imagewoof320

class catalyst.contrib.datasets.cv.Imagewoof320(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.cv.misc.ImageClassificationDataset

Imagewoof Dataset with images resized so that the shortest size is 320 px.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs

NN

Extensions for torch.nn

Criterion

CircleLoss

class catalyst.contrib.nn.criterion.circle.CircleLoss(margin: float, gamma: float)[source]

Bases: torch.nn.modules.module.Module

CircleLoss from Circle Loss: A Unified Perspective of Pair Similarity Optimization paper.

Adapter from: https://github.com/TinyZeaMays/CircleLoss

Example

>>> import torch
>>> from torch.nn import functional as F
>>> from catalyst.contrib.nn import CircleLoss
>>>
>>> features = F.normalize(torch.rand(256, 64, requires_grad=True))
>>> labels = torch.randint(high=10, size=(256))
>>> criterion = CircleLoss(margin=0.25, gamma=256)
>>> criterion(features, labels)
__init__(margin: float, gamma: float) → None[source]
Parameters
  • margin – margin to use

  • gamma – gamma to use

DiceLoss

class catalyst.contrib.nn.criterion.dice.DiceLoss(class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]

Bases: torch.nn.modules.module.Module

The Dice loss. DiceLoss = 1 - dice score dice score = 2 * intersection / (intersection + union)) = = 2 * tp / (2 * tp + fp + fn)

__init__(class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]
Parameters
  • class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1)

  • mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated per-class and than are averaged over all classes. If mode=’weighted’, metric are calculated per-class and than summed over all classes with weights.

  • weights – class weights(for mode=”weighted”)

  • eps – epsilon to avoid zero division

FocalLossBinary

class catalyst.contrib.nn.criterion.focal.FocalLossBinary(ignore: int = None, reduced: bool = False, gamma: float = 2.0, alpha: float = 0.25, threshold: float = 0.5, reduction: str = 'mean')[source]

Bases: torch.nn.modules.loss._Loss

Compute focal loss for binary classification problem.

It has been proposed in Focal Loss for Dense Object Detection paper.

__init__(ignore: int = None, reduced: bool = False, gamma: float = 2.0, alpha: float = 0.25, threshold: float = 0.5, reduction: str = 'mean')[source]

@TODO: Docs. Contribution is welcome.

FocalLossMultiClass

class catalyst.contrib.nn.criterion.focal.FocalLossMultiClass(ignore: int = None, reduced: bool = False, gamma: float = 2.0, alpha: float = 0.25, threshold: float = 0.5, reduction: str = 'mean')[source]

Bases: catalyst.contrib.nn.criterion.focal.FocalLossBinary

Compute focal loss for multiclass problem. Ignores targets having -1 label.

It has been proposed in Focal Loss for Dense Object Detection paper.

__init__(ignore: int = None, reduced: bool = False, gamma: float = 2.0, alpha: float = 0.25, threshold: float = 0.5, reduction: str = 'mean')

@TODO: Docs. Contribution is welcome.

IoULoss

class catalyst.contrib.nn.criterion.iou.IoULoss(class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]

Bases: torch.nn.modules.module.Module

The intersection over union (Jaccard) loss. IOULoss = 1 - iou score iou score = intersection / union = tp / (tp + fp + fn)

__init__(class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]
Parameters
  • class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1)

  • mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated per-class and than are averaged over all classes. If mode=’weighted’, metric are calculated per-class and than summed over all classes with weights.

  • weights – class weights(for mode=”weighted”)

  • eps – epsilon to avoid zero division

MarginLoss

class catalyst.contrib.nn.criterion.margin.MarginLoss(alpha: float = 0.2, beta: float = 1.0, skip_labels: Union[int, List[int]] = -1)[source]

Bases: torch.nn.modules.module.Module

Margin loss criterion

__init__(alpha: float = 0.2, beta: float = 1.0, skip_labels: Union[int, List[int]] = -1)[source]

Margin loss constructor.

Parameters
  • alpha – alpha

  • beta – beta

  • skip_labels (int or List[int]) – labels to skip

TrevskyLoss

class catalyst.contrib.nn.criterion.trevsky.TrevskyLoss(alpha: float, beta: Optional[float] = None, class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]

Bases: torch.nn.modules.module.Module

The trevsky loss. TrevskyIndex = TP / (TP + alpha * FN + betta * FP) TrevskyLoss = 1 - TrevskyIndex

__init__(alpha: float, beta: Optional[float] = None, class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]
Parameters
  • alpha – false negative coefficient, bigger alpha bigger penalty for false negative. Must be in (0, 1)

  • beta – false positive coefficient, bigger alpha bigger penalty for false positive. Must be in (0, 1), if None beta = (1 - alpha)

  • class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1)

  • mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated separately and than are averaged over all classes. If mode=’weighted’, metric are calculated separately and than summed over all classes with weights.

  • weights – class weights(for mode=”weighted”)

  • eps – epsilon to avoid zero division

FocalTrevskyLoss

class catalyst.contrib.nn.criterion.trevsky.FocalTrevskyLoss(alpha: float, beta: Optional[float] = None, gamma: float = 1.3333333333333333, class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]

Bases: torch.nn.modules.module.Module

The focal trevsky loss. TrevskyIndex = TP / (TP + alpha * FN + betta * FP) FocalTrevskyLoss = (1 - TrevskyIndex)^gamma Node: focal will use per image, so loss will pay more attention on complicated images

__init__(alpha: float, beta: Optional[float] = None, gamma: float = 1.3333333333333333, class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]
Parameters
  • alpha – false negative coefficient, bigger alpha bigger penalty for false negative. Must be in (0, 1)

  • beta – false positive coefficient, bigger alpha bigger penalty for false positive. Must be in (0, 1), if None beta = (1 - alpha)

  • gamma – focal coefficient. It determines how much the weight of

  • examples is reduced. (simple) –

  • class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1)

  • mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated separately and than are averaged over all classes. If mode=’weighted’, metric are calculated separately and than summed over all classes with weights.

  • weights – class weights(for mode=”weighted”)

  • eps – epsilon to avoid zero division

TripletMarginLossWithSampler

class catalyst.contrib.nn.criterion.triplet.TripletMarginLossWithSampler(margin: float, sampler_inbatch: IInbatchTripletSampler)[source]

Bases: torch.nn.modules.module.Module

This class combines in-batch sampling of triplets and default TripletMargingLoss from PyTorch.

__init__(margin: float, sampler_inbatch: IInbatchTripletSampler)[source]
Parameters
  • margin – margin value

  • sampler_inbatch – sampler for forming triplets inside the batch

WingLoss

class catalyst.contrib.nn.criterion.wing.WingLoss(width: int = 5, curvature: float = 0.5, reduction: str = 'mean')[source]

Bases: torch.nn.modules.module.Module

Creates a criterion that optimizes a Wing loss.

It has been proposed in Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks.

Adapted from: https://github.com/BloodAxe/pytorch-toolbelt

__init__(width: int = 5, curvature: float = 0.5, reduction: str = 'mean')[source]
Parameters

@TODO – Docs. Contribution is welcome.

Modules

ArcFace and SubCenterArcFace

class catalyst.contrib.nn.modules.arcface.ArcFace(in_features: int, out_features: int, s: float = 64.0, m: float = 0.5, eps: float = 1e-06)[source]

Bases: torch.nn.modules.module.Module

Implementation of ArcFace: Additive Angular Margin Loss for Deep Face Recognition.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

  • s – norm of input feature. Default: 64.0.

  • m – margin. Default: 0.5.

  • eps – operation accuracy. Default: 1e-6.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = ArcFace(5, 10, s=1.31, m=0.5)
>>> loss_fn = nn.CrosEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding, target)
>>> loss = loss_fn(output, target)
>>> loss.backward()
forward(input: torch.Tensor, target: torch.LongTensor = None) → torch.Tensor[source]
Parameters
  • input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

  • target – target classes, expected shapes B where B is batch dimension. If None then will be returned projection on centroids. Default is None.

Returns

tensor (logits) with shapes BxC where C is a number of classes (out_features).

class catalyst.contrib.nn.modules.arcface.SubCenterArcFace(in_features: int, out_features: int, s: float = 64.0, m: float = 0.5, k: int = 3, eps: float = 1e-06)[source]

Bases: torch.nn.modules.module.Module

Implementation of Sub-center ArcFace: Boosting Face Recognition by Large-scale Noisy Web Faces.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

  • s – norm of input feature, Default: 64.0.

  • m – margin. Default: 0.5.

  • k – number of possible class centroids. Default: 3.

  • eps (float, optional) – operation accuracy. Default: 1e-6.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = SubCenterArcFace(5, 10, s=1.31, m=0.35, k=2)
>>> loss_fn = nn.CrosEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding, target)
>>> loss = loss_fn(output, target)
>>> loss.backward()
forward(input: torch.Tensor, target: torch.LongTensor = None) → torch.Tensor[source]
Parameters
  • input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

  • target – target classes, expected shapes B where B is batch dimension. If None then will be returned projection on centroids. Default is None.

Returns

tensor (logits) with shapes BxC where C is a number of classes.

Arc Margin Product

class catalyst.contrib.nn.modules.arcmargin.ArcMarginProduct(in_features: int, out_features: int)[source]

Bases: torch.nn.modules.module.Module

Implementation of Arc Margin Product.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = ArcMarginProduct(5, 10)
>>> loss_fn = nn.CrosEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding)
>>> loss = loss_fn(output, target)
>>> loss.backward()
forward(input: torch.Tensor) → torch.Tensor[source]
Parameters

input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

Returns

tensor (logits) with shapes BxC where C is a number of classes (out_features).

CosFace and AdaCos

class catalyst.contrib.nn.modules.cosface.CosFace(in_features: int, out_features: int, s: float = 64.0, m: float = 0.35)[source]

Bases: torch.nn.modules.module.Module

Implementation of CosFace: Large Margin Cosine Loss for Deep Face Recognition.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

  • s – norm of input feature. Default: 64.0.

  • m – margin. Default: 0.35.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = CosFaceLoss(5, 10, s=1.31, m=0.1)
>>> loss_fn = nn.CrosEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding, target)
>>> loss = loss_fn(output, target)
>>> loss.backward()
forward(input: torch.Tensor, target: torch.LongTensor = None) → torch.Tensor[source]
Parameters
  • input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

  • target – target classes, expected shapes B where B is batch dimension. If None then will be returned projection on centroids. Default is None.

Returns

tensor (logits) with shapes BxC where C is a number of classes (out_features).

class catalyst.contrib.nn.modules.cosface.AdaCos(in_features: int, out_features: int, dynamical_s: bool = True, eps: float = 1e-06)[source]

Bases: torch.nn.modules.module.Module

Implementation of AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

  • dynamical_s – option to use dynamical scale parameter. If False then will be used initial scale. Default: True.

  • eps – operation accuracy. Default: 1e-6.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = AdaCos(5, 10)
>>> loss_fn = nn.CrosEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding, target)
>>> loss = loss_fn(output, target)
>>> loss.backward()
forward(input: torch.Tensor, target: torch.LongTensor = None) → torch.Tensor[source]
Parameters
  • input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

  • target – target classes, expected shapes B where B is batch dimension. If None then will be returned projection on centroids. Default is None.

Returns

tensor (logits) with shapes BxC where C is a number of classes (out_features).

CurricularFace

class catalyst.contrib.nn.modules.curricularface.CurricularFace(in_features: int, out_features: int, s: float = 64.0, m: float = 0.5)[source]

Bases: torch.nn.modules.module.Module

Implementation of CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition.

Official pytorch implementation.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

  • s – norm of input feature. Default: 64.0.

  • m – margin. Default: 0.5.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = CurricularFace(5, 10, s=1.31, m=0.5)
>>> loss_fn = nn.CrosEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding, target)
>>> loss = loss_fn(output, target)
>>> loss.backward()
forward(input: torch.Tensor, label: torch.LongTensor = None) → torch.Tensor[source]
Parameters
  • input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

  • label – target classes, expected shapes B where B is batch dimension. If None then will be returned projection on centroids. Default is None.

Returns

tensor (logits) with shapes BxC where C is a number of classes.

sSE

class catalyst.contrib.nn.modules.se.sSE(in_channels: int)[source]

Bases: torch.nn.modules.module.Module

The sSE (Channel Squeeze and Spatial Excitation) block from the Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks paper.

Adapted from https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/66178

Shape:

  • Input: (batch, channels, height, width)

  • Output: (batch, channels, height, width) (same shape as input)

__init__(in_channels: int)[source]
Parameters

in_channels – The number of channels in the feature map of the input.

cSE

class catalyst.contrib.nn.modules.se.cSE(in_channels: int, r: int = 16)[source]

Bases: torch.nn.modules.module.Module

The channel-wise SE (Squeeze and Excitation) block from the Squeeze-and-Excitation Networks paper.

Adapted from https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/65939 and https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/66178

Shape:

  • Input: (batch, channels, height, width)

  • Output: (batch, channels, height, width) (same shape as input)

__init__(in_channels: int, r: int = 16)[source]
Parameters
  • in_channels – The number of channels in the feature map of the input.

  • r – The reduction ratio of the intermediate channels. Default: 16.

scSE

class catalyst.contrib.nn.modules.se.scSE(in_channels: int, r: int = 16)[source]

Bases: torch.nn.modules.module.Module

The scSE (Concurrent Spatial and Channel Squeeze and Channel Excitation) block from the Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks paper.

Adapted from https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/66178

Shape:

  • Input: (batch, channels, height, width)

  • Output: (batch, channels, height, width) (same shape as input)

__init__(in_channels: int, r: int = 16)[source]
Parameters
  • in_channels – The number of channels in the feature map of the input.

  • r – The reduction ratio of the intermediate channels. Default: 16.

SoftMax

class catalyst.contrib.nn.modules.softmax.SoftMax(in_features: int, num_classes: int)[source]

Bases: torch.nn.modules.module.Module

Implementation of Significance of Softmax-based Features in Comparison to Distance Metric Learning-based Features.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = SoftMax(5, 10)
>>> loss_fn = nn.CrosEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding, target)
>>> loss = loss_fn(output, target)
>>> loss.backward()
forward(input: torch.Tensor) → torch.Tensor[source]
Parameters

input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

Returns

tensor (logits) with shapes BxC where C is a number of classes (out_features).

Optimizers

AdamP

class catalyst.contrib.nn.optimizers.adamp.AdamP(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, delta=0.1, wd_ratio=0.1, nesterov=False)[source]

Bases: torch.optim.optimizer.Optimizer

Implements AdamP algorithm.

The original Adam algorithm was proposed in Adam: A Method for Stochastic Optimization. The AdamP variant was proposed in Slowing Down the Weight Norm Increase in Momentum-based Optimizers.

Parameters
  • params – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (default: 1e-3)

  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • weight_decay (float, optional) – weight decay coefficient (default: 0)

  • delta – threshold that determines whether a set of parameters is scale invariant or not (default: 0.1)

  • wd_ratio – relative weight decay applied on scale-invariant parameters compared to that applied on scale-variant parameters (default: 0.1)

  • nesterov (boolean, optional) – enables Nesterov momentum (default: False)

Original source code: https://github.com/clovaai/AdamP

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, delta=0.1, wd_ratio=0.1, nesterov=False)[source]
Parameters
  • params – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (default: 1e-3)

  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • weight_decay (float, optional) – weight decay coefficient (default: 1e-2)

  • delta – threshold that determines whether a set of parameters is scale invariant or not (default: 0.1)

  • wd_ratio – relative weight decay applied on scale-invariant parameters compared to that applied on scale-variant parameters (default: 0.1)

  • nesterov (boolean, optional) – enables Nesterov momentum (default: False)

Lamb

class catalyst.contrib.nn.optimizers.lamb.Lamb(params, lr: Optional[float] = 0.001, betas: Optional[Tuple[float, float]] = (0.9, 0.999), eps: Optional[float] = 1e-06, weight_decay: Optional[float] = 0.0, adam: Optional[bool] = False)[source]

Bases: torch.optim.optimizer.Optimizer

Implements Lamb algorithm.

It has been proposed in Training BERT in 76 minutes.

__init__(params, lr: Optional[float] = 0.001, betas: Optional[Tuple[float, float]] = (0.9, 0.999), eps: Optional[float] = 1e-06, weight_decay: Optional[float] = 0.0, adam: Optional[bool] = False)[source]
Parameters
  • params – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (default: 1e-3)

  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

  • adam (bool, optional) – always use trust ratio = 1, which turns this into Adam. Useful for comparison purposes.

Lookahead

class catalyst.contrib.nn.optimizers.lookahead.Lookahead(optimizer: torch.optim.optimizer.Optimizer, k: int = 5, alpha: float = 0.5)[source]

Bases: torch.optim.optimizer.Optimizer

Implements Lookahead algorithm.

It has been proposed in Lookahead Optimizer: k steps forward, 1 step back.

Adapted from: https://github.com/alphadl/lookahead.pytorch (MIT License)

__init__(optimizer: torch.optim.optimizer.Optimizer, k: int = 5, alpha: float = 0.5)[source]

@TODO: Docs. Contribution is welcome.

QHAdamW

class catalyst.contrib.nn.optimizers.qhadamw.QHAdamW(params, lr=0.001, betas=(0.995, 0.999), nus=(0.7, 1.0), weight_decay=0.0, eps=1e-08)[source]

Bases: torch.optim.optimizer.Optimizer

Implements QHAdam algorithm.

Combines QHAdam algorithm that was proposed in Quasi-hyperbolic momentum and Adam for deep learning with weight decay decoupling from Decoupled Weight Decay Regularization paper.

Example

>>> optimizer = QHAdamW(
...     model.parameters(),
...     lr=3e-4, nus=(0.8, 1.0), betas=(0.99, 0.999))
>>> optimizer.zero_grad()
>>> loss_fn(model(input), target).backward()
>>> optimizer.step()

Adapted from: https://github.com/iprally/qhadamw-pytorch/blob/master/qhadamw.py (MIT License)

__init__(params, lr=0.001, betas=(0.995, 0.999), nus=(0.7, 1.0), weight_decay=0.0, eps=1e-08)[source]
Parameters
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (\(\alpha\) from the paper) (default: 1e-3)

  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of the gradient and its square (default: (0.995, 0.999))

  • nus (Tuple[float, float], optional) – immediate discount factors used to estimate the gradient and its square (default: (0.7, 1.0))

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • weight_decay (float, optional) – weight decay (L2 regularization coefficient, times two) (default: 0.0)

RAdam

class catalyst.contrib.nn.optimizers.radam.RAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)[source]

Bases: torch.optim.optimizer.Optimizer

Implements RAdam algorithm.

It has been proposed in On the Variance of the Adaptive Learning Rate and Beyond.

@TODO: Docs (add Example). Contribution is welcome

Adapted from: https://github.com/LiyuanLucasLiu/RAdam (Apache-2.0 License)

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)[source]

@TODO: Docs. Contribution is welcome.

Ralamb

class catalyst.contrib.nn.optimizers.ralamb.Ralamb(params: Iterable, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0)[source]

Bases: torch.optim.optimizer.Optimizer

RAdam optimizer with LARS/LAMB tricks.

Adapted from: https://github.com/mgrankin/over9000/blob/master/ralamb.py (Apache-2.0 License)

__init__(params: Iterable, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0)[source]
Parameters
  • params – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (default: 1e-3)

  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

SGDP

class catalyst.contrib.nn.optimizers.sgdp.SGDP(params, lr=<required parameter>, momentum=0, weight_decay=0, dampening=0, nesterov=False, eps=1e-08, delta=0.1, wd_ratio=0.1)[source]

Bases: torch.optim.optimizer.Optimizer

Implements SGDP algorithm.

The SGDP variant was proposed in Slowing Down the Weight Norm Increase in Momentum-based Optimizers.

Parameters
  • params – iterable of parameters to optimize or dicts defining parameter groups

  • lr – learning rate

  • momentum (float, optional) – momentum factor (default: 0)

  • weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

  • dampening (float, optional) – dampening for momentum (default: 0)

  • nesterov (bool, optional) – enables Nesterov momentum (default: False)

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • delta – threshold that determines whether a set of parameters is scale invariant or not (default: 0.1)

  • wd_ratio – relative weight decay applied on scale-invariant parameters compared to that applied on scale-variant parameters (default: 0.1)

__init__(params, lr=<required parameter>, momentum=0, weight_decay=0, dampening=0, nesterov=False, eps=1e-08, delta=0.1, wd_ratio=0.1)[source]
Parameters
  • params – iterable of parameters to optimize or dicts defining parameter groups

  • lr – learning rate

  • momentum (float, optional) – momentum factor (default: 0)

  • weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

  • dampening (float, optional) – dampening for momentum (default: 0)

  • nesterov (bool, optional) – enables Nesterov momentum (default: False)

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • delta – threshold that determines whether a set of parameters is scale invariant or not (default: 0.1)

  • wd_ratio – relative weight decay applied on scale-invariant parameters compared to that applied on scale-variant parameters (default: 0.1)

Schedulers

OneCycleLRWithWarmup

class catalyst.contrib.nn.schedulers.onecycle.OneCycleLRWithWarmup(optimizer: torch.optim.optimizer.Optimizer, num_steps: int, lr_range=(1.0, 0.005), init_lr: float = None, warmup_steps: int = 0, warmup_fraction: float = None, decay_steps: int = 0, decay_fraction: float = None, momentum_range=(0.8, 0.99, 0.999), init_momentum: float = None)[source]

Bases: catalyst.contrib.nn.schedulers.base.BatchScheduler

OneCycle scheduler with warm-up & lr decay stages.

First stage increases lr from init_lr to max_lr, and called warmup. Also it decreases momentum from init_momentum to min_momentum. Takes warmup_steps steps

Second is annealing stage. Decrease lr from max_lr to min_lr, Increase momentum from min_momentum to max_momentum.

Third, optional, lr decay.

__init__(optimizer: torch.optim.optimizer.Optimizer, num_steps: int, lr_range=(1.0, 0.005), init_lr: float = None, warmup_steps: int = 0, warmup_fraction: float = None, decay_steps: int = 0, decay_fraction: float = None, momentum_range=(0.8, 0.99, 0.999), init_momentum: float = None)[source]
Parameters
  • optimizer – PyTorch optimizer

  • num_steps – total number of steps

  • lr_range – tuple with two or three elements (max_lr, min_lr, [final_lr])

  • init_lr (float, optional) – initial lr

  • warmup_steps – count of steps for warm-up stage

  • warmup_fraction (float, optional) – fraction in [0; 1) to calculate number of warmup steps. Cannot be set together with warmup_steps

  • decay_steps – count of steps for lr decay stage

  • decay_fraction (float, optional) – fraction in [0; 1) to calculate number of decay steps. Cannot be set together with decay_steps

  • momentum_range – tuple with two or three elements (min_momentum, max_momentum, [final_momentum])

  • init_momentum (float, optional) – initial momentum

Scripts

You can use contrib scripts with catalyst-contrib in your terminal. For example:

$ catalyst-contrib tag2label --help

Catalyst-contrib scripts.

Examples

1. collect-env outputs relevant system environment info. Diagnose your system and show basic information. Used to get detail info for better bug reporting.

$ catalyst-contrib collect-env

2. process-images reads raw data and outputs preprocessed resized images

$ catalyst-contrib process-images \\
    --in-dir /path/to/raw/data/ \\
    --out-dir=./data/dataset \\
    --num-workers=6 \\
    --max-size=224 \\
    --extension=png \\
    --clear-exif \\
    --grayscale \\
    --expand-dims

3. tag2label prepares a dataset to json like {“class_id”: class_column_from_dataset}

$ catalyst-contrib tag2label \\
    --in-dir=./data/dataset \\
    --out-dataset=./data/dataset_raw.csv \\
    --out-labeling=./data/tag2cls.json
  1. split-dataframe split your dataset into train/valid folds

$ catalyst-contrib split-dataframe \\
    --in-csv=./data/dataset_raw.csv \\
    --tag2class=./data/tag2cls.json \\
    --tag-column=tag \\
    --class-column=class \\
    --n-folds=5 \\
    --train-folds=0,1,2,3 \\
    --out-csv=./data/dataset.csv