Shortcuts

Contrib

Note: under development, best contrib modules will be placed here with docs and examples.
If you would like to see your contribution here - please open a Pull Request or write us on slack.

Catalyst contrib modules are supported in the code-as-a-documentation format. If you are interested in the details - please, follow the code of the implementation. If you are interested in contributing to the library - feel free to open a pull request. For more information, please follow the code for contrib-based extensions.

Data

InBatchSamplers

InBatchTripletsSampler

class catalyst.contrib.data.sampler_inbatch.InBatchTripletsSampler[source]

Bases: catalyst.contrib.data.sampler_inbatch.IInbatchTripletSampler

Base class for a triplets samplers. We expect that the child instances of this class will be used to forming triplets inside the batches. (Note. It is assumed that set of output features is a subset of samples features inside the batch.) The batches must contain at least 2 samples for each class and at least 2 different classes, such behaviour can be garantee via using catalyst.data.sampler.BatchBalanceClassSampler

But you are not limited to using it in any other way.

sample(features: torch.Tensor, labels: Union[List[int], torch.Tensor]) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]
Parameters
  • features – has the shape of [batch_size, feature_size]

  • labels – labels of the samples in the batch

Returns

(anchor, positive, negative)

Return type

the batch of the triplets in the order below

AllTripletsSampler

class catalyst.contrib.data.sampler_inbatch.AllTripletsSampler(max_output_triplets: int = 9223372036854775807)[source]

Bases: catalyst.contrib.data.sampler_inbatch.InBatchTripletsSampler

This sampler selects all the possible triplets for the given labels

__init__(max_output_triplets: int = 9223372036854775807)[source]
Parameters

max_output_triplets – with the strategy of choosing all the triplets, their number in the batch can be very large, because of it we can sample only random part of them, determined by this parameter.

HardTripletsSampler

class catalyst.contrib.data.sampler_inbatch.HardTripletsSampler(norm_required: bool = False)[source]

Bases: catalyst.contrib.data.sampler_inbatch.InBatchTripletsSampler

This sampler selects hardest triplets based on distances between features: the hardest positive sample has the maximal distance to the anchor sample, the hardest negative sample has the minimal distance to the anchor sample.

Note that a typical triplet loss chart is as follows: 1. Falling: loss decreases to a value equal to the margin. 2. Long plato: the loss oscillates near the margin. 3. Falling: loss decreases to zero.

__init__(norm_required: bool = False)[source]
Parameters

norm_required – set True if features normalisation is needed

HardClusterSampler

class catalyst.contrib.data.sampler_inbatch.HardClusterSampler[source]

Bases: catalyst.contrib.data.sampler_inbatch.IInbatchTripletSampler

This sampler selects hardest triplets based on distance to mean vectors: anchor is a mean vector of features of i-th class in the batch, the hardest positive sample is the most distant from anchor sample of anchor’s class, the hardest negative sample is the closest mean vector of another classes.

The batch must contain k samples for p classes in it (k > 1, p > 1).

sample(features: torch.Tensor, labels: Union[List[int], torch.Tensor]) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

This method samples the hardest triplets in the batch.

Parameters
  • features – tensor of shape (batch_size; embed_dim) that contains k samples for each of p classes

  • labels – labels of the batch, list or tensor of size (batch_size)

Returns

p triplets of (mean_vector, positive, negative_mean_vector)

Samplers

BalanceBatchSampler

class catalyst.contrib.data.sampler.BalanceBatchSampler(labels: Union[List[int], numpy.ndarray], p: int, k: int)[source]

Bases: torch.utils.data.sampler.Sampler

This kind of sampler can be used for both metric learning and classification task.

Warning

Deprecated realization, used for backward compatibility. Please use BatchBalanceClassSampler instead.

Sampler with the given strategy for the C unique classes dataset: - Selection P of C classes for the 1st batch - Selection K instances for each class for the 1st batch - Selection P of C - P remaining classes for 2nd batch - Selection K instances for each class for the 2nd batch - … The epoch ends when there are no classes left. So, the batch sise is P * K except the last one.

Thus, in each epoch, all the classes will be selected once, but this does not mean that all the instances will be selected during the epoch.

One of the purposes of this sampler is to be used for forming triplets and pos/neg pairs inside the batch. To guarante existance of these pairs in the batch, P and K should be > 1. (1)

Behavior in corner cases: - If a class does not contain K instances, a choice will be made with repetition. - If C % P == 1 then one of the classes should be dropped otherwise statement (1) will not be met.

This type of sampling can be found in the classical paper of Person Re-Id, where P equals 32 and K equals 4: In Defense of the Triplet Loss for Person Re-Identification.

Parameters
  • labels – list of classes labeles for each elem in the dataset

  • p – number of classes in a batch, should be > 1

  • k – number of instances of each class in a batch, should be > 1

__init__(labels: Union[List[int], numpy.ndarray], p: int, k: int)[source]

Sampler initialisation.

property batch_size

Returns: this value should be used in DataLoader as batch size

property batches_in_epoch

Returns: number of batches in an epoch

DynamicBalanceClassSampler

class catalyst.contrib.data.sampler.DynamicBalanceClassSampler(labels: List[Union[str, int]], exp_lambda: float = 0.9, start_epoch: int = 0, max_d: Optional[int] = None, mode: Union[str, int] = 'downsampling', ignore_warning: bool = False)[source]

Bases: torch.utils.data.sampler.Sampler

This kind of sampler can be used for classification tasks with significant class imbalance.

The idea of this sampler that we start with the original class distribution and gradually move to uniform class distribution like with downsampling.

Let’s define :math: D_i = #C_i/ #C_min where :math: #C_i is a size of class i and :math: #C_min is a size of the rarest class, so :math: D_i define class distribution. Also define :math: g(n_epoch) is a exponential scheduler. On each epoch current :math: D_i calculated as :math: current D_i = D_i ^ g(n_epoch), after this data samples according this distribution.

Notes

In the end of the training, epochs will contain only min_size_class * n_classes examples. So, possible it will not necessary to do validation on each epoch. For this reason use ControlFlowCallback.

Examples

>>> import torch
>>> import numpy as np
>>> from catalyst.data import DynamicBalanceClassSampler
>>> from torch.utils import data
>>> features = torch.Tensor(np.random.random((200, 100)))
>>> labels = np.random.randint(0, 4, size=(200,))
>>> sampler = DynamicBalanceClassSampler(labels)
>>> labels = torch.LongTensor(labels)
>>> dataset = data.TensorDataset(features, labels)
>>> loader = data.dataloader.DataLoader(dataset, batch_size=8)
>>> for batch in loader:
>>>     b_features, b_labels = batch

Sampler was inspired by https://arxiv.org/abs/1901.06783

__init__(labels: List[Union[str, int]], exp_lambda: float = 0.9, start_epoch: int = 0, max_d: Optional[int] = None, mode: Union[str, int] = 'downsampling', ignore_warning: bool = False)[source]
Parameters
  • labels – list of labels for each elem in the dataset

  • exp_lambda – exponent figure for schedule

  • start_epoch – start epoch number, can be useful for multistage

  • experiments

  • max_d – if not None, limit on the difference between the most

  • and the rarest classes, heuristic (frequent) –

  • mode – number of samples per class in the end of training. Must be

  • or number. Before change it, make sure that you ("downsampling") –

  • how does it work (understand) –

  • ignore_warning – ignore warning about min class size

Transforms

Compose

class catalyst.contrib.data.transforms.Compose(transforms)[source]

Bases: object

Composes several transforms together.

__init__(transforms)[source]
Parameters

transforms – list of transforms to compose.

Example

>>> Compose([ToTensor(), Normalize()])

ImageToTensor

class catalyst.contrib.data.transforms.ImageToTensor[source]

Bases: object

Convert a numpy.ndarray to tensor. Converts numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the numpy.ndarray has dtype = np.uint8 In the other cases, tensors are returned without scaling.

__init__()

Initialize self. See help(type(self)) for accurate signature.

NormalizeImage

class catalyst.contrib.data.transforms.NormalizeImage(mean, std, inplace=False)[source]

Bases: object

Normalize a tensor image with mean and standard deviation.

Given mean: (mean[1],...,mean[n]) and std: (std[1],..,std[n]) for n channels, this transform will normalize each channel of the input torch.*Tensor i.e., output[channel] = (input[channel] - mean[channel]) / std[channel]

Note

This transform acts out of place, i.e.,

it does not mutate the input tensor.

__init__(mean, std, inplace=False)[source]
Parameters
  • mean – Sequence of means for each channel.

  • std – Sequence of standard deviations for each channel.

  • inplace (bool,optional) – Bool to make this operation in-place.

Datasets

CIFAR10

class catalyst.contrib.datasets.cifar.CIFAR10(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]

Bases: catalyst.contrib.datasets.cifar.VisionDataset

CIFAR10 Dataset.

Parameters
  • root (string) – Root directory of dataset where directory cifar-10-batches-py exists or will be saved to if download is set to True.

  • train (bool, optional) – If True, creates dataset from training set, otherwise creates from test set.

  • transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

  • target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

__init__(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False) → None[source]

CIFAR100

class catalyst.contrib.datasets.cifar.CIFAR100(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False)[source]

Bases: catalyst.contrib.datasets.cifar.CIFAR10

CIFAR100 Dataset.

This is a subclass of the CIFAR10 Dataset.

__init__(root: str, train: bool = True, transform: Optional[Callable] = None, target_transform: Optional[Callable] = None, download: bool = False) → None

Imagenette

class catalyst.contrib.datasets.imagenette.Imagenette(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.misc_cv.ImageClassificationDataset

Imagenette Dataset.

Note

catalyst[cv] required for this dataset.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs – Keyword-arguments passed to super().__init__ method.

Imagenette160

class catalyst.contrib.datasets.imagenette.Imagenette160(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.misc_cv.ImageClassificationDataset

Imagenette Dataset with images resized so that the shortest size is 160 px.

Note

catalyst[cv] required for this dataset.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs – Keyword-arguments passed to super().__init__ method.

Imagenette320

class catalyst.contrib.datasets.imagenette.Imagenette320(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.misc_cv.ImageClassificationDataset

Imagenette Dataset with images resized so that the shortest size is 320 px.

Note

catalyst[cv] required for this dataset.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs – Keyword-arguments passed to super().__init__ method.

Imagewang

class catalyst.contrib.datasets.imagewang.Imagewang(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.misc_cv.ImageClassificationDataset

Imagewang Dataset.

Note

catalyst[cv] required for this dataset.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs – Keyword-arguments passed to super().__init__ method.

Imagewang160

class catalyst.contrib.datasets.imagewang.Imagewang160(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.misc_cv.ImageClassificationDataset

Imagewang Dataset with images resized so that the shortest size is 160 px.

Note

catalyst[cv] required for this dataset.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs – Keyword-arguments passed to super().__init__ method.

Imagewang320

class catalyst.contrib.datasets.imagewang.Imagewang320(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.misc_cv.ImageClassificationDataset

Imagewang Dataset with images resized so that the shortest size is 320 px.

Note

catalyst[cv] required for this dataset.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs – Keyword-arguments passed to super().__init__ method.

Imagewoof

catalyst.contrib.datasets.imagewoof

alias of catalyst.contrib.datasets.imagewoof

Imagewoof160

class catalyst.contrib.datasets.Imagewoof160(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.misc_cv.ImageClassificationDataset

Imagewoof Dataset with images resized so that the shortest size is 160 px.

Note

catalyst[cv] required for this dataset.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs – Keyword-arguments passed to super().__init__ method.

Imagewoof320

class catalyst.contrib.datasets.Imagewoof320(root: str, train: bool = True, download: bool = False, **kwargs)[source]

Bases: catalyst.contrib.datasets.misc_cv.ImageClassificationDataset

Imagewoof Dataset with images resized so that the shortest size is 320 px.

Note

catalyst[cv] required for this dataset.

__init__(root: str, train: bool = True, download: bool = False, **kwargs)

Constructor method for the ImageClassificationDataset class.

Parameters
  • root – root directory of dataset

  • train – if True, creates dataset from train/ subfolder, otherwise from val/

  • download – if True, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again

  • **kwargs – Keyword-arguments passed to super().__init__ method.

MNIST

class catalyst.contrib.datasets.mnist.MNIST(root: str, train: bool = True, download: bool = True, normalize: tuple = (0.1307, 0.3081), numpy: bool = False)[source]

Bases: torch.utils.data.dataset.Dataset

MNIST Dataset for testing purposes.

Args:
root: Root directory of dataset where

MNIST/processed/training.pt and MNIST/processed/test.pt exist.

train (bool, optional): If True, creates dataset from

training.pt, otherwise from test.pt.

download (bool, optional): If true, downloads the dataset from

the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

normalize (tuple, optional): mean and std

for the MNIST dataset normalization.

numpy (bool, optional): boolean flag to return an np.ndarray,

rather than torch.tensor (default: False).

Raises

RuntimeError – If download is False and the dataset not found.

__init__(root: str, train: bool = True, download: bool = True, normalize: tuple = (0.1307, 0.3081), numpy: bool = False)[source]

Init.

MovieLens

class catalyst.contrib.datasets.movielens.MovieLens(root, train=True, download=False, min_rating=0.0)[source]

Bases: torch.utils.data.dataset.Dataset

MovieLens data sets were collected by the GroupLens Research Project at the University of Minnesota.

This data set consists of: * 100,000 ratings (1-5) from 943 users on 1682 movies. * Each user has rated at least 20 movies. * Simple demographic info for the users (age, gender, occupation, zip)

The data was collected through the MovieLens web site (movielens.umn.edu) during the seven-month period from September 19th, 1997 through April 22nd, 1998. This data has been cleaned up - users who had less than 20 ratings or did not have complete demographic information were removed from this data set. Detailed descriptions of the data file can be found at the end of this file.

Neither the University of Minnesota nor any of the researchers involved can guarantee the correctness of the data, its suitability for any particular purpose, or the validity of results based on the use of the data set. The data set may be used for any research purposes under the following conditions: * The user may not state or imply any endorsement from the University of Minnesota or the GroupLens Research Group. * The user must acknowledge the use of the data set in publications resulting from the use of the data set (see below for citation information). * The user may not redistribute the data without separate permission. * The user may not use this information for any commercial or revenue-bearing purposes without first obtaining permission from a faculty member of the GroupLens Research Project at the University of Minnesota.

If you have any further questions or comments, please contact GroupLens <grouplens-info@cs.umn.edu>. http://files.grouplens.org/datasets/movielens/ml-100k-README.txt

Note

catalyst[ml] required for this dataset.

__init__(root, train=True, download=False, min_rating=0.0)[source]
Parameters
  • root (string) – Root directory of dataset where MovieLens/processed/training.pt and MovieLens/processed/test.pt exist.

  • train (bool, optional) – If True, creates dataset from training.pt, otherwise from test.pt.

  • download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

  • min_rating (float, optional) – Minimum rating to include in the interaction matrix

Raises

RuntimeError – If download is False and the dataset not found.

Layers

AdaCos

class catalyst.contrib.layers.cosface.AdaCos(in_features: int, out_features: int, dynamical_s: bool = True, eps: float = 1e-06)[source]

Bases: torch.nn.modules.module.Module

Implementation of AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

  • dynamical_s – option to use dynamical scale parameter. If False then will be used initial scale. Default: True.

  • eps – operation accuracy. Default: 1e-6.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = AdaCos(5, 10)
>>> loss_fn = nn.CrosEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding, target)
>>> loss = loss_fn(output, target)
>>> self.engine.backward(loss)
forward(input: torch.Tensor, target: torch.LongTensor = None) → torch.Tensor[source]
Parameters
  • input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

  • target – target classes, expected shapes B where B is batch dimension. If None then will be returned projection on centroids. Default is None.

Returns

tensor (logits) with shapes BxC where C is a number of classes (out_features).

AMSoftmax

class catalyst.contrib.layers.amsoftmax.AMSoftmax(in_features: int, out_features: int, s: float = 64.0, m: float = 0.5, eps: float = 1e-06)[source]

Bases: torch.nn.modules.module.Module

Implementation of AMSoftmax: Additive Margin Softmax for Face Verification.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

  • s – norm of input feature. Default: 64.0.

  • m – margin. Default: 0.5.

  • eps – operation accuracy. Default: 1e-6.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = AMSoftmax(5, 10, s=1.31, m=0.5)
>>> loss_fn = nn.CrossEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding, target)
>>> loss = loss_fn(output, target)
>>> self.engine.backward(loss)
forward(input: torch.Tensor, target: torch.LongTensor = None) → torch.Tensor[source]
Parameters
  • input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

  • target – target classes, expected shapes B where B is batch dimension. If None then will be returned projection on centroids. Default is None.

Returns

tensor (logits) with shapes BxC where C is a number of classes (out_features).

ArcFace

class catalyst.contrib.layers.arcface.ArcFace(in_features: int, out_features: int, s: float = 64.0, m: float = 0.5, eps: float = 1e-06)[source]

Bases: torch.nn.modules.module.Module

Implementation of ArcFace: Additive Angular Margin Loss for Deep Face Recognition.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

  • s – norm of input feature. Default: 64.0.

  • m – margin. Default: 0.5.

  • eps – operation accuracy. Default: 1e-6.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = ArcFace(5, 10, s=1.31, m=0.5)
>>> loss_fn = nn.CrossEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding, target)
>>> loss = loss_fn(output, target)
>>> self.engine.backward(loss)
forward(input: torch.Tensor, target: torch.LongTensor = None) → torch.Tensor[source]
Parameters
  • input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

  • target – target classes, expected shapes B where B is batch dimension. If None then will be returned projection on centroids. Default is None.

Returns

tensor (logits) with shapes BxC where C is a number of classes (out_features).

ArcMarginProduct

class catalyst.contrib.layers.arcmargin.ArcMarginProduct(in_features: int, out_features: int)[source]

Bases: torch.nn.modules.module.Module

Implementation of Arc Margin Product.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = ArcMarginProduct(5, 10)
>>> loss_fn = nn.CrosEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding)
>>> loss = loss_fn(output, target)
>>> self.engine.backward(loss)
forward(input: torch.Tensor) → torch.Tensor[source]
Parameters

input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

Returns

tensor (logits) with shapes BxC where C is a number of classes (out_features).

cSE

class catalyst.contrib.layers.se.cSE(in_channels: int, r: int = 16)[source]

Bases: torch.nn.modules.module.Module

The channel-wise SE (Squeeze and Excitation) block from the Squeeze-and-Excitation Networks paper.

Adapted from https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/65939 and https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/66178

Shape:

  • Input: (batch, channels, height, width)

  • Output: (batch, channels, height, width) (same shape as input)

__init__(in_channels: int, r: int = 16)[source]
Parameters
  • in_channels – The number of channels in the feature map of the input.

  • r – The reduction ratio of the intermediate channels. Default: 16.

CosFace

class catalyst.contrib.layers.cosface.CosFace(in_features: int, out_features: int, s: float = 64.0, m: float = 0.35)[source]

Bases: torch.nn.modules.module.Module

Implementation of CosFace: Large Margin Cosine Loss for Deep Face Recognition.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

  • s – norm of input feature. Default: 64.0.

  • m – margin. Default: 0.35.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = CosFaceLoss(5, 10, s=1.31, m=0.1)
>>> loss_fn = nn.CrosEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding, target)
>>> loss = loss_fn(output, target)
>>> self.engine.backward(loss)
forward(input: torch.Tensor, target: torch.LongTensor = None) → torch.Tensor[source]
Parameters
  • input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

  • target – target classes, expected shapes B where B is batch dimension. If None then will be returned projection on centroids. Default is None.

Returns

tensor (logits) with shapes BxC where C is a number of classes (out_features).

CurricularFace

class catalyst.contrib.layers.curricularface.CurricularFace(in_features: int, out_features: int, s: float = 64.0, m: float = 0.5)[source]

Bases: torch.nn.modules.module.Module

Implementation of CurricularFace: Adaptive Curriculum Learning Loss for Deep Face Recognition.

Official pytorch implementation.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

  • s – norm of input feature. Default: 64.0.

  • m – margin. Default: 0.5.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = CurricularFace(5, 10, s=1.31, m=0.5)
>>> loss_fn = nn.CrosEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding, target)
>>> loss = loss_fn(output, target)
>>> self.engine.backward(loss)
forward(input: torch.Tensor, label: torch.LongTensor = None) → torch.Tensor[source]
Parameters
  • input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

  • label – target classes, expected shapes B where B is batch dimension. If None then will be returned projection on centroids. Default is None.

Returns

tensor (logits) with shapes BxC where C is a number of classes.

FactorizedLinear

class catalyst.contrib.layers.factorized.FactorizedLinear(nn_linear: torch.nn.modules.linear.Linear, dim_ratio: Union[int, float] = 1.0)[source]

Bases: torch.nn.modules.module.Module

Factorized wrapper for nn.Linear

Parameters
  • nn_linear – torch nn.Linear module

  • dim_ratio – dimension ration to use after weights SVD

extra_repr() → str[source]

Extra representation log.

forward(x: torch.Tensor)[source]

Forward call.

scSE

class catalyst.contrib.layers.se.scSE(in_channels: int, r: int = 16)[source]

Bases: torch.nn.modules.module.Module

The scSE (Concurrent Spatial and Channel Squeeze and Channel Excitation) block from the Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks paper.

Adapted from https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/66178

Shape:

  • Input: (batch, channels, height, width)

  • Output: (batch, channels, height, width) (same shape as input)

__init__(in_channels: int, r: int = 16)[source]
Parameters
  • in_channels – The number of channels in the feature map of the input.

  • r – The reduction ratio of the intermediate channels. Default: 16.

SoftMax

class catalyst.contrib.layers.softmax.SoftMax(in_features: int, num_classes: int)[source]

Bases: torch.nn.modules.module.Module

Implementation of Significance of Softmax-based Features in Comparison to Distance Metric Learning-based Features.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = SoftMax(5, 10)
>>> loss_fn = nn.CrosEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding, target)
>>> loss = loss_fn(output, target)
>>> self.engine.backward(loss)
forward(input: torch.Tensor) → torch.Tensor[source]
Parameters

input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

Returns

tensor (logits) with shapes BxC where C is a number of classes (out_features).

sSE

class catalyst.contrib.layers.se.sSE(in_channels: int)[source]

Bases: torch.nn.modules.module.Module

The sSE (Channel Squeeze and Spatial Excitation) block from the Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks paper.

Adapted from https://www.kaggle.com/c/tgs-salt-identification-challenge/discussion/66178

Shape:

  • Input: (batch, channels, height, width)

  • Output: (batch, channels, height, width) (same shape as input)

__init__(in_channels: int)[source]
Parameters

in_channels – The number of channels in the feature map of the input.

SubCenterArcFace

class catalyst.contrib.layers.arcface.SubCenterArcFace(in_features: int, out_features: int, s: float = 64.0, m: float = 0.5, k: int = 3, eps: float = 1e-06)[source]

Bases: torch.nn.modules.module.Module

Implementation of Sub-center ArcFace: Boosting Face Recognition by Large-scale Noisy Web Faces.

Parameters
  • in_features – size of each input sample.

  • out_features – size of each output sample.

  • s – norm of input feature, Default: 64.0.

  • m – margin. Default: 0.5.

  • k – number of possible class centroids. Default: 3.

  • eps (float, optional) – operation accuracy. Default: 1e-6.

Shape:
  • Input: \((batch, H_{in})\) where \(H_{in} = in\_features\).

  • Output: \((batch, H_{out})\) where \(H_{out} = out\_features\).

Example

>>> layer = SubCenterArcFace(5, 10, s=1.31, m=0.35, k=2)
>>> loss_fn = nn.CrosEntropyLoss()
>>> embedding = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(10)
>>> output = layer(embedding, target)
>>> loss = loss_fn(output, target)
>>> self.engine.backward(loss)
forward(input: torch.Tensor, target: torch.LongTensor = None) → torch.Tensor[source]
Parameters
  • input – input features, expected shapes BxF where B is batch dimension and F is an input feature dimension.

  • target – target classes, expected shapes B where B is batch dimension. If None then will be returned projection on centroids. Default is None.

Returns

tensor (logits) with shapes BxC where C is a number of classes.

Losses

AdaptiveHingeLoss

class catalyst.contrib.losses.recsys.AdaptiveHingeLoss[source]

Bases: catalyst.contrib.losses.recsys.PairwiseLoss

Adaptive hinge loss function.

Takes a set of predictions for implicitly negative items, and selects those that are highest, thus sampling those negatives that are closes to violating the ranking implicit in the pattern of user interactions.

Example:

import torch
from catalyst.contrib.losses import recsys

pos_score = torch.randn(3, requires_grad=True)
neg_scores = torch.randn(5, 3, requires_grad=True)

output = recsys.AdaptiveHingeLoss()(pos_score, neg_scores)
output.backward()
forward(positive_score: torch.Tensor, negative_scores: torch.Tensor) → torch.Tensor[source]

Forward propagation method for the adaptive hinge loss.

Parameters
  • positive_score – Tensor containing predictions for known positive items.

  • negative_scores – Iterable of tensors containing predictions for sampled negative items. More tensors increase the likelihood of finding ranking-violating pairs, but risk overfitting.

Returns

computed loss

BarlowTwinsLoss

class catalyst.contrib.losses.contrastive.BarlowTwinsLoss(offdiag_lambda=1.0, eps=1e-12)[source]

Bases: torch.nn.modules.module.Module

The Contrastive embedding loss.

It has been proposed in Barlow Twins: Self-Supervised Learning via Redundancy Reduction.

Example:

import torch
from torch.nn import functional as F
from catalyst.contrib import BarlowTwinsLoss

embeddings_left = F.normalize(torch.rand(256, 64, requires_grad=True))
embeddings_right = F.normalize(torch.rand(256, 64, requires_grad=True))
criterion = BarlowTwinsLoss(offdiag_lambda = 1)
criterion(embeddings_left, embeddings_right)
__init__(offdiag_lambda=1.0, eps=1e-12)[source]
Parameters
  • offdiag_lambda – trade-off parameter

  • eps – shift for the varience (var + eps)

BPRLoss

class catalyst.contrib.losses.recsys.BPRLoss(gamma=1e-10)[source]

Bases: catalyst.contrib.losses.recsys.PairwiseLoss

Bayesian Personalised Ranking loss function.

It has been proposed in BPRLoss: Bayesian Personalized Ranking from Implicit Feedback.

Parameters

gamma (float) – Small value to avoid division by zero. Default: 1e-10.

Example:

import torch
from catalyst.contrib.losses import recsys

pos_score = torch.randn(3, requires_grad=True)
neg_score = torch.randn(3, requires_grad=True)

output = recsys.BPRLoss()(pos_score, neg_score)
output.backward()
forward(positive_score: torch.Tensor, negative_score: torch.Tensor) → torch.Tensor[source]

Forward propagation method for the BPR loss.

Parameters
  • positive_score – Tensor containing predictions for known positive items.

  • negative_score – Tensor containing predictions for sampled negative items.

Returns

computed loss

CircleLoss

class catalyst.contrib.losses.circle.CircleLoss(margin: float, gamma: float)[source]

Bases: torch.nn.modules.module.Module

CircleLoss from Circle Loss: A Unified Perspective of Pair Similarity Optimization paper.

Adapter from: https://github.com/TinyZeaMays/CircleLoss

Example

>>> import torch
>>> from torch.nn import functional as F
>>> from catalyst.contrib.losses import CircleLoss
>>>
>>> features = F.normalize(torch.rand(256, 64, requires_grad=True))
>>> labels = torch.randint(high=10, size=(256))
>>> criterion = CircleLoss(margin=0.25, gamma=256)
>>> criterion(features, labels)
__init__(margin: float, gamma: float) → None[source]
Parameters
  • margin – margin to use

  • gamma – gamma to use

DiceLoss

class catalyst.contrib.losses.dice.DiceLoss(class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]

Bases: torch.nn.modules.module.Module

The Dice loss. DiceLoss = 1 - dice score dice score = 2 * intersection / (intersection + union)) = = 2 * tp / (2 * tp + fp + fn)

__init__(class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]
Parameters
  • class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1)

  • mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated per-class and than are averaged over all classes. If mode=’weighted’, metric are calculated per-class and than summed over all classes with weights.

  • weights – class weights(for mode=”weighted”)

  • eps – epsilon to avoid zero division

FocalLossBinary

class catalyst.contrib.losses.focal.FocalLossBinary(ignore: int = None, reduced: bool = False, gamma: float = 2.0, alpha: float = 0.25, threshold: float = 0.5, reduction: str = 'mean')[source]

Bases: torch.nn.modules.loss._Loss

Compute focal loss for binary classification problem.

It has been proposed in Focal Loss for Dense Object Detection paper.

__init__(ignore: int = None, reduced: bool = False, gamma: float = 2.0, alpha: float = 0.25, threshold: float = 0.5, reduction: str = 'mean')[source]

@TODO: Docs. Contribution is welcome.

FocalLossMultiClass

class catalyst.contrib.losses.focal.FocalLossMultiClass(ignore: int = None, reduced: bool = False, gamma: float = 2.0, alpha: float = 0.25, threshold: float = 0.5, reduction: str = 'mean')[source]

Bases: catalyst.contrib.losses.focal.FocalLossBinary

Compute focal loss for multiclass problem. Ignores targets having -1 label.

It has been proposed in Focal Loss for Dense Object Detection paper.

__init__(ignore: int = None, reduced: bool = False, gamma: float = 2.0, alpha: float = 0.25, threshold: float = 0.5, reduction: str = 'mean')

@TODO: Docs. Contribution is welcome.

FocalTrevskyLoss

class catalyst.contrib.losses.trevsky.FocalTrevskyLoss(alpha: float, beta: Optional[float] = None, gamma: float = 1.3333333333333333, class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]

Bases: torch.nn.modules.module.Module

The focal trevsky loss. TrevskyIndex = TP / (TP + alpha * FN + betta * FP) FocalTrevskyLoss = (1 - TrevskyIndex)^gamma Node: focal will use per image, so loss will pay more attention on complicated images

__init__(alpha: float, beta: Optional[float] = None, gamma: float = 1.3333333333333333, class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]
Parameters
  • alpha – false negative coefficient, bigger alpha bigger penalty for false negative. Must be in (0, 1)

  • beta – false positive coefficient, bigger alpha bigger penalty for false positive. Must be in (0, 1), if None beta = (1 - alpha)

  • gamma – focal coefficient. It determines how much the weight of

  • examples is reduced. (simple) –

  • class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1)

  • mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated separately and than are averaged over all classes. If mode=’weighted’, metric are calculated separately and than summed over all classes with weights.

  • weights – class weights(for mode=”weighted”)

  • eps – epsilon to avoid zero division

HingeLoss

class catalyst.contrib.losses.recsys.HingeLoss[source]

Bases: catalyst.contrib.losses.recsys.PairwiseLoss

Hinge loss function.

Example:

import torch
from catalyst.contrib.losses import recsys

pos_score = torch.randn(3, requires_grad=True)
neg_score = torch.randn(3, requires_grad=True)

output = recsys.HingeLoss()(pos_score, neg_score)
output.backward()
forward(positive_score: torch.Tensor, negative_score: torch.Tensor) → torch.Tensor[source]

Forward propagation method for the hinge loss.

Parameters
  • positive_score – Tensor containing predictions for known positive items.

  • negative_score – Tensor containing predictions for sampled negative items.

Returns

computed loss

HuberLossV0

class catalyst.contrib.losses.regression.HuberLossV0(clip_delta=1.0, reduction='mean')[source]

Bases: torch.nn.modules.module.Module

@TODO: Docs. Contribution is welcome.

__init__(clip_delta=1.0, reduction='mean')[source]

@TODO: Docs. Contribution is welcome.

forward(output: torch.Tensor, target: torch.Tensor, weights=None) → torch.Tensor[source]

@TODO: Docs. Contribution is welcome.

IoULoss

class catalyst.contrib.losses.iou.IoULoss(class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]

Bases: torch.nn.modules.module.Module

The intersection over union (Jaccard) loss. IOULoss = 1 - iou score iou score = intersection / union = tp / (tp + fp + fn)

__init__(class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]
Parameters
  • class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1)

  • mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated per-class and than are averaged over all classes. If mode=’weighted’, metric are calculated per-class and than summed over all classes with weights.

  • weights – class weights(for mode=”weighted”)

  • eps – epsilon to avoid zero division

LogisticLoss

class catalyst.contrib.losses.recsys.LogisticLoss[source]

Bases: catalyst.contrib.losses.recsys.PairwiseLoss

Logistic loss function.

Example:

import torch
from catalyst.contrib.losses import recsys

pos_score = torch.randn(3, requires_grad=True)
neg_score = torch.randn(3, requires_grad=True)

output = recsys.LogisticLoss()(pos_score, neg_score)
output.backward()
forward(positive_score: torch.Tensor, negative_score: torch.Tensor) → torch.Tensor[source]

Forward propagation method for the logistic loss.

Parameters
  • positive_score – Tensor containing predictions for known positive items.

  • negative_score – Tensor containing predictions for sampled negative items.

Returns

computed loss

NTXentLoss

class catalyst.contrib.losses.ntxent.NTXentLoss(tau: float, reduction: str = 'mean')[source]

Bases: torch.nn.modules.module.Module

A Contrastive embedding loss.

It has been proposed in A Simple Framework for Contrastive Learning of Visual Representations.

Example:

import torch
from torch.nn import functional as F
from catalyst.contrib import NTXentLoss

embeddings_left = F.normalize(torch.rand(256, 64, requires_grad=True))
embeddings_right = F.normalize(torch.rand(256, 64, requires_grad=True))
criterion = NTXentLoss(tau = 0.1)
criterion(embeddings_left, embeddings_right)
__init__(tau: float, reduction: str = 'mean') → None[source]
Parameters
  • tau – temperature

  • reduction (string, optional) – specifies the reduction to apply to the output: "none" | "mean" | "sum". "none": no reduction will be applied, "mean": the sum of the output will be divided by the number of positive pairs in the output, "sum": the output will be summed.

Raises

ValueError – if reduction is not mean, sum or none

RocStarLoss

class catalyst.contrib.losses.recsys.RocStarLoss(delta: float = 1.0, sample_size: int = 100, sample_size_gamma: int = 1000, update_gamma_each: int = 50)[source]

Bases: catalyst.contrib.losses.recsys.PairwiseLoss

Roc-star loss function.

Smooth approximation for ROC-AUC. It has been proposed in Roc-star: An objective function for ROC-AUC that actually works.

Adapted from: https://github.com/iridiumblue/roc-star/issues/2

Parameters
  • delta – Param from the article. Default: 1.0.

  • sample_size – Number of examples to take for ROC AUC approximation. Default: 100.

  • sample_size_gamma – Number of examples to take for Gamma parameter approximation. Default: 1000.

  • update_gamma_each – Number of steps after which to recompute gamma value. Default: 50.

Example

import torch
from catalyst.contrib.losses import recsys

outputs = torch.randn(5, 1, requires_grad=True)
targets = torch.randn(5, 1, requires_grad=True)

output = recsys.RocStarLoss()(outputs, targets)
output.backward()
forward(outputs: torch.Tensor, targets: torch.Tensor) → torch.Tensor[source]

Forward propagation method for the roc-star loss.

Parameters
  • outputs – Tensor of model predictions in [0, 1] range. Shape (B x 1).

  • targets – Tensor of true labels in {0, 1}. Shape (B x 1).

Returns

computed loss

RSquareLoss

class catalyst.contrib.losses.regression.RSquareLoss[source]

Bases: torch.nn.modules.module.Module

forward(outputs: torch.Tensor, targets: torch.Tensor) → torch.Tensor[source]

Compute the loss.

Parameters
  • outputs (torch.Tensor) – model outputs

  • targets (torch.Tensor) – targets

Returns

computed loss

Return type

torch.Tensor

SupervisedContrastiveLoss

class catalyst.contrib.losses.supervised_contrastive.SupervisedContrastiveLoss(tau: float, reduction: str = 'mean', pos_aggregation='in')[source]

Bases: torch.nn.modules.module.Module

A Contrastive embedding loss that uses targets.

It has been proposed in Supervised Contrastive Learning.

__init__(tau: float, reduction: str = 'mean', pos_aggregation='in') → None[source]
Parameters
  • tau – temperature

  • reduction – specifies the reduction to apply to the output: "none" | "mean" | "sum". "none": no reduction will be applied, "mean": the sum of the output will be divided by the number of positive pairs in the output, "sum": the output will be summed.

  • pos_aggregation – specifies the place of positive pairs aggregation: "in" | "out". "in": maximization of log(average positive exponentiate similarity) "out": maximization of average positive similarity

Raises
  • ValueError – if reduction is not mean, sum or none

  • ValueError – if positive aggregation is not in or out

TrevskyLoss

class catalyst.contrib.losses.trevsky.TrevskyLoss(alpha: float, beta: Optional[float] = None, class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]

Bases: torch.nn.modules.module.Module

The trevsky loss. TrevskyIndex = TP / (TP + alpha * FN + betta * FP) TrevskyLoss = 1 - TrevskyIndex

__init__(alpha: float, beta: Optional[float] = None, class_dim: int = 1, mode: str = 'macro', weights: List[float] = None, eps: float = 1e-07)[source]
Parameters
  • alpha – false negative coefficient, bigger alpha bigger penalty for false negative. Must be in (0, 1)

  • beta – false positive coefficient, bigger alpha bigger penalty for false positive. Must be in (0, 1), if None beta = (1 - alpha)

  • class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1)

  • mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated separately and than are averaged over all classes. If mode=’weighted’, metric are calculated separately and than summed over all classes with weights.

  • weights – class weights(for mode=”weighted”)

  • eps – epsilon to avoid zero division

TripletMarginLossWithSampler

class catalyst.contrib.losses.triplet.TripletMarginLossWithSampler(margin: float, sampler_inbatch: IInbatchTripletSampler)[source]

Bases: torch.nn.modules.module.Module

This class combines in-batch sampling of triplets and default TripletMargingLoss from PyTorch.

__init__(margin: float, sampler_inbatch: IInbatchTripletSampler)[source]
Parameters
  • margin – margin value

  • sampler_inbatch – sampler for forming triplets inside the batch

WARPLoss

class catalyst.contrib.losses.recsys.WARPLoss(max_num_trials: Optional[int] = None)[source]

Bases: catalyst.contrib.losses.recsys.ListwiseLoss

Weighted Approximate-Rank Pairwise (WARP) loss function.

It has been proposed in WSABIE: Scaling Up To Large Vocabulary Image Annotation paper.

WARP loss randomly sample output labels of a model, until it finds a pair which it knows are wrongly labelled and will then only apply an update to these two incorrectly labelled examples.

Adapted from: https://github.com/gabrieltseng/datascience-projects/blob/master/misc/warp.py

Parameters

max_num_trials – Number of attempts allowed to find a violating negative example. In practice it means that we optimize for ranks 1 to max_num_trials-1.

Example:

import torch
from catalyst.contrib.losses import recsys

outputs = torch.randn(5, 3, requires_grad=True)
targets = torch.randn(5, 3, requires_grad=True)

output = recsys.WARPLoss()(outputs, targets)
output.backward()
forward(outputs: torch.Tensor, targets: torch.Tensor) → torch.Tensor[source]

Forward propagation method for the WARP loss.

Parameters
  • outputs – Iterable of tensors containing predictions for all items.

  • targets – Iterable of tensors containing true labels for all items.

Returns

computed loss

Optimizers

AdamP

class catalyst.contrib.optimizers.adamp.AdamP(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, delta=0.1, wd_ratio=0.1, nesterov=False)[source]

Bases: torch.optim.optimizer.Optimizer

Implements AdamP algorithm.

The original Adam algorithm was proposed in Adam: A Method for Stochastic Optimization. The AdamP variant was proposed in Slowing Down the Weight Norm Increase in Momentum-based Optimizers.

Parameters
  • params – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (default: 1e-3)

  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • weight_decay (float, optional) – weight decay coefficient (default: 0)

  • delta – threshold that determines whether a set of parameters is scale invariant or not (default: 0.1)

  • wd_ratio – relative weight decay applied on scale-invariant parameters compared to that applied on scale-variant parameters (default: 0.1)

  • nesterov (boolean, optional) – enables Nesterov momentum (default: False)

Original source code: https://github.com/clovaai/AdamP

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, delta=0.1, wd_ratio=0.1, nesterov=False)[source]
Parameters
  • params – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (default: 1e-3)

  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • weight_decay (float, optional) – weight decay coefficient (default: 1e-2)

  • delta – threshold that determines whether a set of parameters is scale invariant or not (default: 0.1)

  • wd_ratio – relative weight decay applied on scale-invariant parameters compared to that applied on scale-variant parameters (default: 0.1)

  • nesterov (boolean, optional) – enables Nesterov momentum (default: False)

Lamb

class catalyst.contrib.optimizers.lamb.Lamb(params, lr: Optional[float] = 0.001, betas: Optional[Tuple[float, float]] = (0.9, 0.999), eps: Optional[float] = 1e-06, weight_decay: Optional[float] = 0.0, adam: Optional[bool] = False)[source]

Bases: torch.optim.optimizer.Optimizer

Implements Lamb algorithm.

It has been proposed in Training BERT in 76 minutes.

__init__(params, lr: Optional[float] = 0.001, betas: Optional[Tuple[float, float]] = (0.9, 0.999), eps: Optional[float] = 1e-06, weight_decay: Optional[float] = 0.0, adam: Optional[bool] = False)[source]
Parameters
  • params – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (default: 1e-3)

  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

  • adam (bool, optional) – always use trust ratio = 1, which turns this into Adam. Useful for comparison purposes.

Raises

ValueError – if invalid learning rate, epsilon value or betas.

Lookahead

class catalyst.contrib.optimizers.lookahead.Lookahead(optimizer: torch.optim.optimizer.Optimizer, k: int = 5, alpha: float = 0.5)[source]

Bases: torch.optim.optimizer.Optimizer

Implements Lookahead algorithm.

It has been proposed in Lookahead Optimizer: k steps forward, 1 step back.

Adapted from: https://github.com/alphadl/lookahead.pytorch (MIT License)

__init__(optimizer: torch.optim.optimizer.Optimizer, k: int = 5, alpha: float = 0.5)[source]

@TODO: Docs. Contribution is welcome.

QHAdamW

class catalyst.contrib.optimizers.qhadamw.QHAdamW(params, lr=0.001, betas=(0.995, 0.999), nus=(0.7, 1.0), weight_decay=0.0, eps=1e-08)[source]

Bases: torch.optim.optimizer.Optimizer

Implements QHAdam algorithm.

Combines QHAdam algorithm that was proposed in Quasi-hyperbolic momentum and Adam for deep learning with weight decay decoupling from Decoupled Weight Decay Regularization paper.

Example

>>> optimizer = QHAdamW(
...     model.parameters(),
...     lr=3e-4, nus=(0.8, 1.0), betas=(0.99, 0.999))
>>> optimizer.zero_grad()
>>> loss_fn(model(input), target).backward()
>>> optimizer.step()

Adapted from: https://github.com/iprally/qhadamw-pytorch/blob/master/qhadamw.py (MIT License)

__init__(params, lr=0.001, betas=(0.995, 0.999), nus=(0.7, 1.0), weight_decay=0.0, eps=1e-08)[source]
Parameters
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (\(\alpha\) from the paper) (default: 1e-3)

  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of the gradient and its square (default: (0.995, 0.999))

  • nus (Tuple[float, float], optional) – immediate discount factors used to estimate the gradient and its square (default: (0.7, 1.0))

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • weight_decay (float, optional) – weight decay (L2 regularization coefficient, times two) (default: 0.0)

Raises

ValueError – if invalid learning rate, epsilon value, betas or weight_decay value.

RAdam

class catalyst.contrib.optimizers.radam.RAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)[source]

Bases: torch.optim.optimizer.Optimizer

Implements RAdam algorithm.

It has been proposed in On the Variance of the Adaptive Learning Rate and Beyond.

@TODO: Docs (add Example). Contribution is welcome

Adapted from: https://github.com/LiyuanLucasLiu/RAdam (Apache-2.0 License)

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)[source]

@TODO: Docs. Contribution is welcome.

Ralamb

class catalyst.contrib.optimizers.ralamb.Ralamb(params: Iterable, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0)[source]

Bases: torch.optim.optimizer.Optimizer

RAdam optimizer with LARS/LAMB tricks.

Adapted from: https://github.com/mgrankin/over9000/blob/master/ralamb.py (Apache-2.0 License)

__init__(params: Iterable, lr: float = 0.001, betas: Tuple[float, float] = (0.9, 0.999), eps: float = 1e-08, weight_decay: float = 0)[source]
Parameters
  • params – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (default: 1e-3)

  • betas (Tuple[float, float], optional) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

SGDP

class catalyst.contrib.optimizers.sgdp.SGDP(params, lr=<required parameter>, momentum=0, weight_decay=0, dampening=0, nesterov=False, eps=1e-08, delta=0.1, wd_ratio=0.1)[source]

Implements SGDP algorithm.

The SGDP variant was proposed in Slowing Down the Weight Norm Increase in Momentum-based Optimizers.

Parameters
  • params – iterable of parameters to optimize or dicts defining parameter groups

  • lr – learning rate

  • momentum (float, optional) – momentum factor (default: 0)

  • weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

  • dampening (float, optional) – dampening for momentum (default: 0)

  • nesterov (bool, optional) – enables Nesterov momentum (default: False)

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • delta – threshold that determines whether a set of parameters is scale invariant or not (default: 0.1)

  • wd_ratio – relative weight decay applied on scale-invariant parameters compared to that applied on scale-variant parameters (default: 0.1)

__init__(params, lr=<required parameter>, momentum=0, weight_decay=0, dampening=0, nesterov=False, eps=1e-08, delta=0.1, wd_ratio=0.1)[source]
Parameters
  • params – iterable of parameters to optimize or dicts defining parameter groups

  • lr – learning rate

  • momentum (float, optional) – momentum factor (default: 0)

  • weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

  • dampening (float, optional) – dampening for momentum (default: 0)

  • nesterov (bool, optional) – enables Nesterov momentum (default: False)

  • eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

  • delta – threshold that determines whether a set of parameters is scale invariant or not (default: 0.1)

  • wd_ratio – relative weight decay applied on scale-invariant parameters compared to that applied on scale-variant parameters (default: 0.1)

Schedulers

OneCycleLRWithWarmup

class catalyst.contrib.schedulers.onecycle.OneCycleLRWithWarmup(optimizer: torch.optim.optimizer.Optimizer, num_steps: int, lr_range=(1.0, 0.005), init_lr: float = None, warmup_steps: int = 0, warmup_fraction: float = None, decay_steps: int = 0, decay_fraction: float = None, momentum_range=(0.8, 0.99, 0.999), init_momentum: float = None)[source]

Bases: catalyst.contrib.schedulers.base.BatchScheduler

OneCycle scheduler with warm-up & lr decay stages.

First stage increases lr from init_lr to max_lr, and called warmup. Also it decreases momentum from init_momentum to min_momentum. Takes warmup_steps steps

Second is annealing stage. Decrease lr from max_lr to min_lr, Increase momentum from min_momentum to max_momentum.

Third, optional, lr decay.

__init__(optimizer: torch.optim.optimizer.Optimizer, num_steps: int, lr_range=(1.0, 0.005), init_lr: float = None, warmup_steps: int = 0, warmup_fraction: float = None, decay_steps: int = 0, decay_fraction: float = None, momentum_range=(0.8, 0.99, 0.999), init_momentum: float = None)[source]
Parameters
  • optimizer – PyTorch optimizer

  • num_steps – total number of steps

  • lr_range – tuple with two or three elements (max_lr, min_lr, [final_lr])

  • init_lr (float, optional) – initial lr

  • warmup_steps – count of steps for warm-up stage

  • warmup_fraction (float, optional) – fraction in [0; 1) to calculate number of warmup steps. Cannot be set together with warmup_steps

  • decay_steps – count of steps for lr decay stage

  • decay_fraction (float, optional) – fraction in [0; 1) to calculate number of decay steps. Cannot be set together with decay_steps

  • momentum_range – tuple with two or three elements (min_momentum, max_momentum, [final_momentum])

  • init_momentum (float, optional) – initial momentum