Shortcuts

Metrics

Metric API

IMetric

class catalyst.metrics._metric.IMetric(compute_on_call: bool = True)[source]

Bases: abc.ABC

Interface for all Metrics.

Parameters

compute_on_call – Computes and returns metric value during metric call. Used for per-batch logging. default: True

__init__(compute_on_call: bool = True)[source]

Interface for all Metrics.

abstract compute()Any[source]

Computes the metric based on it’s accumulated state.

By default, this is called at the end of each loader (on_loader_end event).

Returns

computed value, # noqa: DAR202 it’s better to return key-value

Return type

Any

abstract reset()None[source]

Resets the metric to it’s initial state.

By default, this is called at the start of each loader (on_loader_start event).

abstract update(*args, **kwargs)Any[source]

Updates the metrics state using the passed data.

By default, this is called at the end of each batch (on_batch_end event).

Parameters
  • *args – some args :)

  • **kwargs – some kwargs ;)

ICallbackBatchMetric

class catalyst.metrics._metric.ICallbackBatchMetric(compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.IMetric

Interface for all batch-based Metrics.

__init__(compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Init

abstract compute_key_value()Dict[str, float][source]

Computes the metric based on it’s accumulated state.

By default, this is called at the end of each loader (on_loader_end event).

Returns

computed value in key-value format. # noqa: DAR202

Return type

Dict

abstract update_key_value(*args, **kwargs)Dict[str, float][source]

Updates the metric based with new input.

By default, this is called at the end of each loader (on_loader_end event).

Parameters
  • *args – some args

  • **kwargs – some kwargs

Returns

computed value in key-value format. # noqa: DAR202

Return type

Dict

ICallbackLoaderMetric

class catalyst.metrics._metric.ICallbackLoaderMetric(compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.IMetric

Interface for all loader-based Metrics.

Parameters
  • compute_on_call – Computes and returns metric value during metric call. Used for per-batch logging. default: True

  • prefix – metrics prefix

  • suffix – metrics suffix

__init__(compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Init.

abstract compute_key_value()Dict[str, float][source]

Computes the metric based on it’s accumulated state.

By default, this is called at the end of each loader (on_loader_end event).

Returns

computed value in key-value format. # noqa: DAR202

Return type

Dict

abstract reset(num_batches: int, num_samples: int)None[source]

Resets the metric to it’s initial state.

By default, this is called at the start of each loader (on_loader_start event).

Parameters
  • num_batches – number of expected batches.

  • num_samples – number of expected samples.

abstract update(*args, **kwargs)None[source]

Updates the metrics state using the passed data.

By default, this is called at the end of each batch (on_batch_end event).

Parameters
  • *args – some args :)

  • **kwargs – some kwargs ;)

AccumulationMetric

class catalyst.metrics._metric.AccumulationMetric(accumulative_fields: Optional[Iterable[str]] = None, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.ICallbackLoaderMetric

This metric accumulates all the input data along loader

Parameters
  • accumulative_fields – list of keys to accumulate data from batch

  • compute_on_call – if True, allows compute metric’s value on call

  • prefix – metric prefix

  • suffix – metric suffix

__init__(accumulative_fields: Optional[Iterable[str]] = None, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)None[source]

Init AccumulationMetric

compute()Dict[str, torch.Tensor][source]

Return accumulated data

Returns

dict of accumulated data

compute_key_value()Dict[str, torch.Tensor][source]

Return accumulated data

Returns

dict of accumulated data

reset(num_batches: int, num_samples: int)None[source]

Reset metrics fields

Parameters
  • num_batches – expected number of batches

  • num_samples – expected number of samples to accumulate

update(**kwargs)None[source]

Update accumulated data with new batch

Parameters

**kwargs – tensors that should be accumulates

General Metrics

AdditiveValueMetric

class catalyst.metrics._additive.AdditiveValueMetric(compute_on_call: bool = True, mode: str = 'numpy')[source]

Bases: catalyst.metrics._metric.IMetric

This metric computes mean and std values of input data.

Parameters
  • compute_on_call – if True, computes and returns metric value during metric call

  • mode – expected dtype returned by the metric, "numpy" or "torch"

Raises

ValueError – if mode is not supported

Examples:

import numpy as np
from catalyst import metrics

values = [1, 2, 3, 4, 5]
num_samples_list = [1, 2, 3, 4, 5]
true_values = [1, 1.666667, 2.333333, 3, 3.666667]

metric = metrics.AdditiveValueMetric()
for value, num_samples, true_value in zip(values, num_samples_list, true_values):
    metric.update(value=value, num_samples=num_samples)
    mean, _ = metric.compute()
    assert np.isclose(mean, true_value)
import os
from torch import nn, optim
from torch.nn import functional as F
from torch.utils.data import DataLoader
from catalyst import dl, metrics
from catalyst.data import ToTensor
from catalyst.contrib.datasets import MNIST

model = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 10))
optimizer = optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(
        MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()),
        batch_size=32
    ),
    "valid": DataLoader(
        MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()),
        batch_size=32
    ),
}

class CustomRunner(dl.Runner):
    def predict_batch(self, batch):
        # model inference step
        return self.model(batch[0].to(self.device))

    def on_loader_start(self, runner):
        super().on_loader_start(runner)
        self.meters = {
            key: metrics.AdditiveValueMetric(compute_on_call=False)
            for key in ["loss", "accuracy01", "accuracy03"]
        }

    def handle_batch(self, batch):
        # model train/valid step
        # unpack the batch
        x, y = batch
        # run model forward pass
        logits = self.model(x)
        # compute the loss
        loss = F.cross_entropy(logits, y)
        # compute other metrics of interest
        accuracy01, accuracy03 = metrics.accuracy(logits, y, topk=(1, 3))
        # log metrics
        self.batch_metrics.update(
            {"loss": loss, "accuracy01": accuracy01, "accuracy03": accuracy03}
        )
        for key in ["loss", "accuracy01", "accuracy03"]:
            self.meters[key].update(self.batch_metrics[key].item(), self.batch_size)
        # run model backward pass
        if self.is_train_loader:
            loss.backward()
            self.optimizer.step()
            self.optimizer.zero_grad()

    def on_loader_end(self, runner):
        for key in ["loss", "accuracy01", "accuracy03"]:
            self.loader_metrics[key] = self.meters[key].compute()[0]
        super().on_loader_end(runner)

runner = CustomRunner()
# model training
runner.train(
    model=model,
    optimizer=optimizer,
    loaders=loaders,
    logdir="./logs",
    num_epochs=5,
    verbose=True,
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
)

Note

Please follow the minimal examples sections for more use cases.

__init__(compute_on_call: bool = True, mode: str = 'numpy')[source]

Init AdditiveValueMetric

compute()Tuple[float, float][source]

Returns mean and std values of all the input data

Returns

tuple of mean and std values

reset()None[source]

Reset all fields

update(value: float, num_samples: int)float[source]

Update mean metric value and std with new value.

Parameters
  • value – value to update mean and std with

  • num_samples – number of value samples that metrics should be updated with

Returns

last value

ConfusionMatrixMetric

class catalyst.metrics._confusion_matrix.ConfusionMatrixMetric(num_classes: int, normalized: bool = False, compute_on_call: bool = True)[source]

Bases: catalyst.metrics._metric.IMetric

Constructs a confusion matrix for a multiclass classification problems.

Parameters
  • num_classes – number of classes in the classification problem

  • normalized – determines whether or not the confusion matrix is normalized or not

  • compute_on_call – Boolean flag to computes and return confusion matrix during __call__. default: True

Examples:

import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (torch.rand(num_samples,) * num_classes).to(torch.int64)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    valid_loader="valid",
    valid_metric="accuracy03",
    minimize_valid_metric=False,
    verbose=True,
    callbacks=[
        dl.AccuracyCallback(
            input_key="logits", target_key="targets", num_classes=num_classes
        ),
        dl.PrecisionRecallF1SupportCallback(
            input_key="logits", target_key="targets", num_classes=num_classes
        ),
        dl.AUCCallback(input_key="logits", target_key="targets"),
        dl.ConfusionMatrixCallback(
            input_key="logits", target_key="targets", num_classes=num_classes
        ),
    ],
)

Note

Please follow the minimal examples sections for more use cases.

FunctionalBatchMetric

class catalyst.metrics._functional_metric.FunctionalBatchMetric(metric_fn: Callable, metric_key: str, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.ICallbackBatchMetric

Class for custom batch-based metrics in a functional way.

Parameters
  • metric_fn – metric function, that get outputs, targets and return score as torch.Tensor

  • metric_key – metric name

  • compute_on_call – Computes and returns metric value during metric call. Used for per-batch logging. default: True

  • prefix – metric prefix

  • suffix – metric suffix

Note

Loader metrics calculated as average over all batch metrics.

Examples:

import torch
from catalyst import metrics
import sklearn.metrics

outputs = torch.tensor([1, 0, 2, 1])
targets = torch.tensor([3, 0, 2, 2])

metric = metrics.FunctionalBatchMetric(
    metric_fn=sklearn.metrics.accuracy_score,
    metric_key="sk_accuracy",
)
metric.reset()

metric.update(batch_size=len(outputs), y_pred=outputs, y_true=targets)
metric.compute()
# (0.5, 0.0)  # mean, std

metric.compute_key_value()
# {'sk_accuracy': 0.5, 'sk_accuracy/mean': 0.5, 'sk_accuracy/std': 0.0}

FunctionalLoaderMetric

class catalyst.metrics._functional_metric.FunctionalLoaderMetric(metric_fn: Callable, metric_key: str, accumulative_fields: Optional[Iterable[str]] = None, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.ICallbackLoaderMetric

Class for custom loader-based metrics in a functional way.

Parameters
  • metric_fn – metric function, that get outputs, targets and return score as torch.Tensor

  • metric_key – metric name

  • accumulative_fields – list of keys to accumulate data from batch

  • compute_on_call – if True, allows compute metric’s value on call

  • prefix – metric prefix

  • suffix – metric suffix

Note

Metrics are calculated over all samples.

Examples:

from functools import partial
import torch
from catalyst import metrics
import sklearn.metrics

targets = torch.tensor([3, 0, 2, 2, 1])
outputs = torch.rand((len(targets), targets.max()+1)).softmax(1)

metric = metrics.FunctionalLoaderMetric(
    metric_fn=partial(
        sklearn.metrics.roc_auc_score, average="macro", multi_class="ovr"
    ),
    metric_key="sk_auc",
    accumulative_fields=['y_score','y_true'],

)
metric.reset(len(outputs), len(outputs))

metric.update(y_score=outputs, y_true=targets)
metric.compute()
# ...

metric.compute_key_value()
# {'sk_auc': ...}

Runner Metrics

Accuracy - AccuracyMetric

class catalyst.metrics._accuracy.AccuracyMetric(topk_args: Optional[List[int]] = None, num_classes: Optional[int] = None, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.ICallbackBatchMetric

This metric computes accuracy for multiclass classification case. It computes mean value of accuracy and it’s approximate std value (note that it’s not a real accuracy std but std of accuracy over batch mean values).

Parameters
  • topk_args – list of topk for accuracy@topk computing

  • num_classes – number of classes

  • compute_on_call – if True, computes and returns metric value during metric call

  • prefix – metric prefix

  • suffix – metric suffix

Examples:

import torch
from catalyst import metrics

outputs = torch.tensor([
    [0.2, 0.5, 0.0, 0.3],
    [0.9, 0.1, 0.0, 0.0],
    [0.0, 0.1, 0.6, 0.3],
    [0.0, 0.8, 0.2, 0.0],
])
targets = torch.tensor([3, 0, 2, 2])
metric = metrics.AccuracyMetric(topk_args=(1, 3))

metric.reset()
metric.update(outputs, targets)
metric.compute()
# (
#     (0.5, 1.0),  # top1, top3 mean
#     (0.0, 0.0),  # top1, top3 std
# )

metric.compute_key_value()
# {
#     'accuracy': 0.5,
#     'accuracy/std': 0.0,
#     'accuracy01': 0.5,
#     'accuracy01/std': 0.0,
#     'accuracy03': 1.0,
#     'accuracy03/std': 0.0,
# }

metric.reset()
metric(outputs, targets)
# (
#     (0.5, 1.0),  # top1, top3 mean
#     (0.0, 0.0),  # top1, top3 std
# )
import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (torch.rand(num_samples,) * num_classes).to(torch.int64)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    valid_loader="valid",
    valid_metric="accuracy03",
    minimize_valid_metric=False,
    verbose=True,
    callbacks=[
        dl.AccuracyCallback(
            input_key="logits", target_key="targets", num_classes=num_classes
        ),
        dl.PrecisionRecallF1SupportCallback(
            input_key="logits", target_key="targets", num_classes=num_classes
        ),
        dl.AUCCallback(input_key="logits", target_key="targets"),
    ],
)

Note

Please follow the minimal examples sections for more use cases.

Accuracy - MultilabelAccuracyMetric

class catalyst.metrics._accuracy.MultilabelAccuracyMetric(threshold: Union[float, torch.Tensor] = 0.5, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._additive.AdditiveValueMetric, catalyst.metrics._metric.ICallbackBatchMetric

This metric computes accuracy for multilabel classification case. It computes mean value of accuracy and it’s approximate std value (note that it’s not a real accuracy std but std of accuracy over batch mean values).

Parameters
  • compute_on_call – if True, computes and returns metric value during metric call

  • prefix – metric prefix

  • suffix – metric suffix

  • threshold – thresholds for model scores

Examples:

import torch
from catalyst import metrics

outputs = torch.tensor([
    [0.1, 0.9, 0.0, 0.8],
    [0.96, 0.01, 0.85, 0.2],
    [0.98, 0.4, 0.2, 0.1],
    [0.1, 0.89, 0.2, 0.0],
])
targets = torch.tensor([
    [0, 1, 1, 0],
    [1, 0, 1, 0],
    [0, 1, 0, 0],
    [0, 1, 0, 0],
])
metric = metrics.MultilabelAccuracyMetric(threshold=0.6)

metric.reset()
metric.update(outputs, targets)
metric.compute()
# (0.75, 0.0)  # mean, std

metric.compute_key_value()
# {
#     'accuracy': 0.75,
#     'accuracy/std': 0.0,
# }

metric.reset()
metric(outputs, targets)
# (0.75, 0.0)  # mean, std
import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (torch.rand(num_samples, num_classes) > 0.5).to(torch.float32)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    valid_loader="valid",
    valid_metric="accuracy",
    minimize_valid_metric=False,
    verbose=True,
    callbacks=[
        dl.AUCCallback(input_key="logits", target_key="targets"),
        dl.MultilabelAccuracyCallback(
            input_key="logits", target_key="targets", threshold=0.5
        )
    ]
)

Note

Please follow the minimal examples sections for more use cases.

AUCMetric

class catalyst.metrics._auc.AUCMetric(compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.ICallbackLoaderMetric

AUC metric,

Parameters
  • compute_on_call – if True, computes and returns metric value during metric call

  • prefix – metric prefix

  • suffix – metric suffix

Warning

This metric is under API improvement.

Examples:

import torch
from catalyst import metrics

scores = torch.tensor([
    [0.9, 0.1],
    [0.1, 0.9],
])
targets = torch.tensor([
    [1, 0],
    [0, 1],
])
metric = metrics.AUCMetric()

# for efficient statistics storage
metric.reset(num_batches=1, num_samples=len(scores))
metric.update(scores, targets)
metric.compute()
# (
#     tensor([1., 1.])  # per class
#     1.0,              # micro
#     1.0,              # macro
#     1.0               # weighted
# )

metric.compute_key_value()
# {
#     'auc': 1.0,
#     'auc/_micro': 1.0,
#     'auc/_macro': 1.0,
#     'auc/_weighted': 1.0
#     'auc/class_00': 1.0,
#     'auc/class_01': 1.0,
# }

metric.reset(num_batches=1, num_samples=len(scores))
metric(scores, targets)
# (
#     tensor([1., 1.])  # per class
#     1.0,              # micro
#     1.0,              # macro
#     1.0               # weighted
# )
import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (torch.rand(num_samples,) * num_classes).to(torch.int64)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    valid_loader="valid",
    valid_metric="accuracy03",
    minimize_valid_metric=False,
    verbose=True,
    callbacks=[
        dl.AccuracyCallback(
            input_key="logits", target_key="targets", num_classes=num_classes
        ),
        dl.PrecisionRecallF1SupportCallback(
            input_key="logits", target_key="targets", num_classes=num_classes
        ),
        dl.AUCCallback(input_key="logits", target_key="targets"),
    ],
)

Note

Please follow the minimal examples sections for more use cases.

Classification – BinaryPrecisionRecallF1Metric

class catalyst.metrics._classification.BinaryPrecisionRecallF1Metric(zero_division: int = 0, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._classification.StatisticsMetric

Precision, recall, f1_score and support metrics for binary classification.

Parameters
  • zero_division – value to set in case of zero division during metrics (precision, recall) computation; should be one of 0 or 1

  • compute_on_call – if True, allows compute metric’s value on call

  • prefix – metric prefix

  • suffix – metric suffix

Classification – MulticlassPrecisionRecallF1SupportMetric

class catalyst.metrics._classification.MulticlassPrecisionRecallF1SupportMetric(num_classes: Optional[int] = None, zero_division: int = 0, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._classification.PrecisionRecallF1SupportMetric

Precision, recall, f1_score and support metrics for multiclass classification. Counts metrics with macro, micro and weighted average.

Parameters
  • num_classes – number of classes in loader’s dataset

  • zero_division – value to set in case of zero division during metrics (precision, recall) computation; should be one of 0 or 1

  • compute_on_call – if True, allows compute metric’s value on call

  • prefix – metric prefix

  • suffix – metric suffix

Examples:

import torch
from catalyst import metrics

num_classes = 4
zero_division = 0
outputs_list = [torch.tensor([0, 1, 2]), torch.tensor([2, 3]), torch.tensor([0, 1, 3])]
targets_list = [torch.tensor([0, 1, 1]), torch.tensor([2, 3]), torch.tensor([0, 1, 2])]

metric = metrics.MulticlassPrecisionRecallF1SupportMetric(
    num_classes=num_classes, zero_division=zero_division
)
metric.reset()

for outputs, targets in zip(outputs_list, targets_list):
    metric.update(outputs=outputs, targets=targets)

metric.compute()
# (
#     # per class precision, recall, f1, support
#     (
#         array([1. , 1. , 0.5, 0.5]),
#         array([1.        , 0.66666667, 0.5       , 1.        ]),
#         array([0.999995  , 0.7999952 , 0.499995  , 0.66666222]),
#         array([2., 3., 2., 1.]),
#     ),
#     # micro precision, recall, f1, support
#     (0.75, 0.75, 0.7499950000333331, None),
#     # macro precision, recall, f1, support
#     (0.75, 0.7916666666666667, 0.7416618555889127, None),
#     # weighted precision, recall, f1, support
#     (0.8125, 0.75, 0.7583284778110313, None)
# )

metric.compute_key_value()
# {
#     'f1/_macro': 0.7416618555889127,
#     'f1/_micro': 0.7499950000333331,
#     'f1/_weighted': 0.7583284778110313,
#     'f1/class_00': 0.9999950000249999,
#     'f1/class_01': 0.7999952000287999,
#     'f1/class_02': 0.49999500004999947,
#     'f1/class_03': 0.6666622222518517,
#     'precision/_macro': 0.75,
#     'precision/_micro': 0.75,
#     'precision/_weighted': 0.8125,
#     'precision/class_00': 1.0,
#     'precision/class_01': 1.0,
#     'precision/class_02': 0.5,
#     'precision/class_03': 0.5,
#     'recall/_macro': 0.7916666666666667,
#     'recall/_micro': 0.75,
#     'recall/_weighted': 0.75,
#     'recall/class_00': 1.0,
#     'recall/class_01': 0.6666666666666667,
#     'recall/class_02': 0.5,
#     'recall/class_03': 1.0,
#     'support/class_00': 2.0,
#     'support/class_01': 3.0,
#     'support/class_02': 2.0,
#     'support/class_03': 1.0
# }

metric.reset()
metric(outputs_list[0], targets_list[0])
# (
#     # per class precision, recall, f1, support
#     (
#         array([1., 1., 0., 0.]),
#         array([1. , 0.5, 0. , 0. ]),
#         array([0.999995  , 0.66666222, 0.        , 0.        ]),
#         array([1., 2., 0., 0.]),
#     ),
#     # micro precision, recall, f1, support
#     (0.6666666666666667, 0.6666666666666667, 0.6666616667041664, None),
#     # macro precision, recall, f1, support
#     (0.5, 0.375, 0.41666430556921286, None),
#     # weighted precision, recall, f1, support
#     (1.0, 0.6666666666666666, 0.7777731481762343, None)
# )
import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (torch.rand(num_samples,) * num_classes).to(torch.int64)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    valid_loader="valid",
    valid_metric="accuracy03",
    minimize_valid_metric=False,
    verbose=True,
    callbacks=[
        dl.AccuracyCallback(
            input_key="logits", target_key="targets", num_classes=num_classes
        ),
        dl.PrecisionRecallF1SupportCallback(
            input_key="logits", target_key="targets", num_classes=num_classes
        ),
        dl.AUCCallback(input_key="logits", target_key="targets"),
    ],
)

Note

Please follow the minimal examples sections for more use cases.

Classification – MultilabelPrecisionRecallF1SupportMetric

class catalyst.metrics._classification.MultilabelPrecisionRecallF1SupportMetric(num_classes: Optional[int] = None, zero_division: int = 0, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._classification.PrecisionRecallF1SupportMetric

Precision, recall, f1_score and support metrics for multilabel classification. Counts metrics with macro, micro and weighted average.

Parameters
  • num_classes – number of classes in loader’s dataset

  • zero_division – value to set in case of zero division during metrics (precision, recall) computation; should be one of 0 or 1

  • compute_on_call – if True, allows compute metric’s value on call

  • prefix – metric prefix

  • suffix – metric suffix

Examples:

import torch
from catalyst import metrics

num_classes = 4
zero_division = 0
outputs_list = [
    torch.tensor([[0, 1, 0, 1], [0, 0, 0, 0], [0, 1, 1, 0]]),
    torch.tensor([[0, 1, 1, 1], [0, 0, 0, 1], [0, 1, 0, 1]]),
    torch.tensor([[0, 1, 0, 0], [0, 1, 0, 1]]),
]
targets_list = [
    torch.tensor([[0, 1, 1, 1], [0, 0, 0, 0], [0, 1, 0, 1]]),
    torch.tensor([[0, 1, 0, 0], [0, 0, 1, 1], [1, 0, 1, 0]]),
    torch.tensor([[0, 1, 0, 0], [0, 0, 1, 0]]),
]

metric = metrics.MultilabelPrecisionRecallF1SupportMetric(
    num_classes=num_classes, zero_division=zero_division
)
metric.reset()

for outputs, targets in zip(outputs_list, targets_list):
    metric.update(outputs=outputs, targets=targets)

metric.compute()
# (
#     # per class precision, recall, f1, support
#     (
#         array([0.        , 0.66666667, 0.        , 0.4       ]),
#         array([0.        , 1.        , 0.        , 0.66666667]),
#         array([0.        , 0.7999952 , 0.        , 0.49999531]),
#         array([1., 4., 4., 3.])
#     ),
#     # micro precision, recall, f1, support
#     (0.46153846153846156, 0.5, 0.4799950080519163, None),
#     # macro precision, recall, f1, support
#     (0.2666666666666667, 0.4166666666666667, 0.32499762814318617, None),
#     # weighted precision, recall, f1, support
#     (0.32222222222222224, 0.5, 0.39166389481225283, None)
# )

metric.compute_key_value()
# {
#     'f1/_macro': 0.32499762814318617,
#     'f1/_micro': 0.4799950080519163,
#     'f1/_weighted': 0.39166389481225283,
#     'f1/class_00': 0.0,
#     'f1/class_01': 0.7999952000287999,
#     'f1/class_02': 0.0,
#     'f1/class_03': 0.49999531254394486,
#     'precision/_macro': 0.2666666666666667,
#     'precision/_micro': 0.46153846153846156,
#     'precision/_weighted': 0.32222222222222224,
#     'precision/class_00': 0.0,
#     'precision/class_01': 0.6666666666666667,
#     'precision/class_02': 0.0,
#     'precision/class_03': 0.4,
#     'recall/_macro': 0.4166666666666667,
#     'recall/_micro': 0.5,
#     'recall/_weighted': 0.5,
#     'recall/class_00': 0.0,
#     'recall/class_01': 1.0,
#     'recall/class_02': 0.0,
#     'recall/class_03': 0.6666666666666667,
#     'support/class_00': 1.0,
#     'support/class_01': 4.0,
#     'support/class_02': 4.0,
#     'support/class_03': 3.0
# }


metric.reset()
metric(outputs_list[0], targets_list[0])
# (
#     # per class precision, recall, f1, support
#     (
#         array([0., 1., 0., 1.]),
#         array([0. , 1. , 0. , 0.5]),
#         array([0.        , 0.999995  , 0.        , 0.66666222]),
#         array([0., 2., 1., 2.])
#     ),
#     # micro precision, recall, f1, support
#     (0.75, 0.6, 0.6666617284316411, None),
#     # macro precision, recall, f1, support
#     (0.5, 0.375, 0.41666430556921286, None),
#     # weighted precision, recall, f1, support
#     (0.8, 0.6000000000000001, 0.6666628889107407, None)
# )
import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_samples, num_features, num_classes = int(1e4), int(1e1), 4
X = torch.rand(num_samples, num_features)
y = (torch.rand(num_samples, num_classes) > 0.5).to(torch.float32)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_classes)
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    logdir="./logdir",
    num_epochs=3,
    valid_loader="valid",
    valid_metric="accuracy",
    minimize_valid_metric=False,
    verbose=True,
    callbacks=[
        dl.BatchTransformCallback(
            transform=torch.sigmoid,
            scope="on_batch_end",
            input_key="logits",
            output_key="scores"
        ),
        dl.AUCCallback(input_key="scores", target_key="targets"),
        dl.MultilabelAccuracyCallback(
            input_key="scores", target_key="targets", threshold=0.5
        ),
        dl.MultilabelPrecisionRecallF1SupportCallback(
            input_key="scores", target_key="targets", threshold=0.5
        ),
    ]
)

Note

Please follow the minimal examples sections for more use cases.

CMCMetric

class catalyst.metrics._cmc_score.CMCMetric(embeddings_key: str, labels_key: str, is_query_key: str, topk_args: Optional[Iterable[int]] = None, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.AccumulationMetric

Cumulative Matching Characteristics

Parameters
  • embeddings_key – key of embedding tensor in batch

  • labels_key – key of label tensor in batch

  • is_query_key – key of query flag tensor in batch

  • topk_args – list of k, specifies which cmc@k should be calculated

  • compute_on_call – if True, allows compute metric’s value on call

  • prefix – metric prefix

  • suffix – metric suffix

Examples:

import torch
from catalyst import metrics

batch = {
    "embeddings": torch.tensor(
        [
            [1, 1, 0, 0],
            [1, 0, 1, 1],
            [0, 1, 1, 1],
            [0, 0, 1, 1],
            [1, 1, 1, 0],
            [1, 1, 1, 1],
            [0, 1, 1, 0],
        ]
    ).float(),
    "labels": torch.tensor([0, 0, 1, 1, 0, 1, 1]),
    "is_query": torch.tensor([1, 1, 1, 1, 0, 0, 0]).bool(),
}
topk = (1, 3)

metric = metrics.CMCMetric(
    embeddings_key="embeddings",
    labels_key="labels",
    is_query_key="is_query",
    topk_args=topk,
)
metric.reset(num_batches=1, num_samples=len(batch["embeddings"]))

metric.update(**batch)
metric.compute()
# [0.75, 1.0]  # CMC@01, CMC@03

metric.compute_key_value()
# {'cmc01': 0.75, 'cmc03': 1.0}
import os
from torch.optim import Adam
from torch.utils.data import DataLoader
from catalyst import data, dl
from catalyst.contrib import datasets, models, nn
from catalyst.data.transforms import Compose, Normalize, ToTensor


# 1. train and valid loaders
transforms = Compose([ToTensor(), Normalize((0.1307,), (0.3081,))])

train_dataset = datasets.MnistMLDataset(
    root=os.getcwd(), download=True, transform=transforms
)
sampler = data.BalanceBatchSampler(labels=train_dataset.get_labels(), p=5, k=10)
train_loader = DataLoader(
    dataset=train_dataset, sampler=sampler, batch_size=sampler.batch_size
)

valid_dataset = datasets.MnistQGDataset(
    root=os.getcwd(), transform=transforms, gallery_fraq=0.2
)
valid_loader = DataLoader(dataset=valid_dataset, batch_size=1024)

# 2. model and optimizer
model = models.MnistSimpleNet(out_features=16)
optimizer = Adam(model.parameters(), lr=0.001)

# 3. criterion with triplets sampling
sampler_inbatch = data.HardTripletsSampler(norm_required=False)
criterion = nn.TripletMarginLossWithSampler(margin=0.5, sampler_inbatch=sampler_inbatch)

# 4. training with catalyst Runner
class CustomRunner(dl.SupervisedRunner):
    def handle_batch(self, batch) -> None:
        if self.is_train_loader:
            images, targets = batch["features"].float(), batch["targets"].long()
            features = self.model(images)
            self.batch = {"embeddings": features, "targets": targets,}
        else:
            images, targets, is_query = (
                batch["features"].float(),
                batch["targets"].long(),
                batch["is_query"].bool()
            )
            features = self.model(images)
            self.batch = {
                "embeddings": features, "targets": targets, "is_query": is_query
            }

callbacks = [
    dl.ControlFlowCallback(
        dl.CriterionCallback(
            input_key="embeddings", target_key="targets", metric_key="loss"
        ),
        loaders="train",
    ),
    dl.ControlFlowCallback(
        dl.CMCScoreCallback(
            embeddings_key="embeddings",
            labels_key="targets",
            is_query_key="is_query",
            topk_args=[1],
        ),
        loaders="valid",
    ),
    dl.PeriodicLoaderCallback(
        valid_loader_key="valid", valid_metric_key="cmc01", minimize=False, valid=2
    ),
]

runner = CustomRunner(input_key="features", output_key="embeddings")
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    callbacks=callbacks,
    loaders={"train": train_loader, "valid": valid_loader},
    verbose=False,
    logdir="./logs",
    valid_loader="valid",
    valid_metric="cmc01",
    minimize_valid_metric=False,
    num_epochs=10,
)

Note

Please follow the minimal examples sections for more use cases.

ReidCMCMetric

class catalyst.metrics._cmc_score.ReidCMCMetric(embeddings_key: str, pids_key: str, cids_key: str, is_query_key: str, topk_args: Optional[Iterable[int]] = None, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.AccumulationMetric

Cumulative Matching Characteristics for Reid case

Parameters
  • embeddings_key – key of embedding tensor in batch

  • pids_key – key of pids tensor in batch

  • cids_key – key of cids tensor in batch

  • is_query_key – key of query flag tensor in batch

  • topk_args – list of k, specifies which cmc@k should be calculated

  • compute_on_call – if True, allows compute metric’s value on call

  • prefix – metric prefix

  • suffix – metric suffix

Examples:

import torch
from catalyst.metrics import ReidCMCMetric

batch = {
    "embeddings": torch.tensor(
        [
            [1, 1, 0, 0],
            [1, 0, 0, 0],
            [0, 1, 1, 1],
            [0, 0, 1, 1],
            [1, 1, 1, 0],
            [1, 1, 1, 1],
            [0, 1, 1, 0],
        ]
    ).float(),
    "pids": torch.Tensor([0, 0, 1, 1, 0, 1, 1]).long(),
    "cids": torch.Tensor([0, 1, 1, 2, 0, 1, 3]).long(),
    "is_query": torch.Tensor([1, 1, 1, 1, 0, 0, 0]).bool(),
}
topk = (1, 3)

metric = ReidCMCMetric(
    embeddings_key="embeddings",
    pids_key="pids",
    cids_key="cids",
    is_query_key="is_query",
    topk_args=topk,
)
metric.reset(num_batches=1, num_samples=len(batch["embeddings"]))

metric.update(**batch)
metric.compute()
# [0.75, 1.0]  # CMC@01, CMC@03

metric.compute_key_value()
# {'cmc01': 0.75, 'cmc03': 1.0}

RecSys – HitrateMetric

class catalyst.metrics._hitrate.HitrateMetric(topk_args: Optional[List[int]] = None, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.ICallbackBatchMetric

Calculates the hitrate.

Parameters
  • topk_args – list of topk for hitrate@topk computing

  • compute_on_call – if True, computes and returns metric value during metric call

  • prefix – metric prefix

  • suffix – metric suffix

Compute mean value of hitrate and it’s approximate std value.

Examples:

import torch
from catalyst import metrics

outputs = torch.Tensor([[4.0, 2.0, 3.0, 1.0], [1.0, 2.0, 3.0, 4.0]])
targets = torch.Tensor([[0, 0, 1.0, 1.0], [0, 0, 0.0, 0.0]])
metric = metrics.HitrateMetric(topk_args=[1, 2, 3, 4])
metric.reset()

metric.update(outputs, targets)
metric.compute()
# (
#     (0.0, 0.25, 0.25, 0.5),  # mean for @01, @02, @03, @04
#     (0.0, 0.0, 0.0, 0.0)     # std for @01, @02, @03, @04
# )

metric.compute_key_value()
# {
#     'hitrate': 0.0,
#     'hitrate/std': 0.0,
#     'hitrate01': 0.0,
#     'hitrate01/std': 0.0,
#     'hitrate02': 0.25,
#     'hitrate02/std': 0.0,
#     'hitrate03': 0.25,
#     'hitrate03/std': 0.0,
#     'hitrate04': 0.5,
#     'hitrate04/std': 0.0
# }

metric.reset()
metric(outputs, targets)
# (
#     (0.0, 0.25, 0.25, 0.5),  # mean for @01, @02, @03, @04
#     (0.0, 0.0, 0.0, 0.0)     # std for @01, @02, @03, @04
# )
import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_users, num_features, num_items = int(1e4), int(1e1), 10
X = torch.rand(num_users, num_features)
y = (torch.rand(num_users, num_items) > 0.5).to(torch.float32)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_items)
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    num_epochs=3,
    verbose=True,
    callbacks=[
        dl.BatchTransformCallback(
            transform=torch.sigmoid,
            scope="on_batch_end",
            input_key="logits",
            output_key="scores"
        ),
        dl.CriterionCallback(
            input_key="logits", target_key="targets", metric_key="loss"
        ),
        dl.AUCCallback(input_key="scores", target_key="targets"),
        dl.HitrateCallback(
            input_key="scores", target_key="targets", topk_args=(1, 3, 5)
        ),
        dl.MRRCallback(input_key="scores", target_key="targets", topk_args=(1, 3, 5)),
        dl.MAPCallback(input_key="scores", target_key="targets", topk_args=(1, 3, 5)),
        dl.NDCGCallback(input_key="scores", target_key="targets", topk_args=(1, 3, 5)),
        dl.OptimizerCallback(metric_key="loss"),
        dl.SchedulerCallback(),
        dl.CheckpointCallback(
            logdir="./logs", loader_key="valid", metric_key="loss", minimize=True
        ),
    ]
)

Note

Please follow the minimal examples sections for more use cases.

RecSys – MAPMetric

class catalyst.metrics._map.MAPMetric(topk_args: Optional[List[int]] = None, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.ICallbackBatchMetric

Calculates the Mean Average Precision (MAP) for RecSys. The precision metric summarizes the fraction of relevant items out of the whole the recommendation list. Computes mean value of MAP and it’s approximate std value

Parameters
  • topk_args – list of topk for map@topk computing

  • compute_on_call – if True, computes and returns metric value during metric call

  • prefix – metric prefix

  • suffix – metric suffix

Examples:

import torch
from catalyst import metrics

outputs = torch.tensor([
    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
    [9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
])
targets = torch.tensor([
    [1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0],
    [0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0],
])
metric = metrics.MAPMetric(topk_args=[1, 3, 5, 10])
metric.reset()

metric.update(outputs, targets)
metric.compute()
# (
#     # mean for @01, @03, @05, @10
#     (0.5, 0.6666666865348816, 0.6416666507720947, 0.5325397253036499),
#     # std for @01, @03, @05, @10
#     (0.0, 0.0, 0.0, 0.0)
# )

metric.compute_key_value()
# {
#     'map': 0.5,
#     'map/std': 0.0,
#     'map01': 0.5,
#     'map01/std': 0.0,
#     'map03': 0.6666666865348816,
#     'map03/std': 0.0,
#     'map05': 0.6416666507720947,
#     'map05/std': 0.0,
#     'map10': 0.5325397253036499,
#     'map10/std': 0.0
# }

metric.reset()
metric(outputs, targets)
# (
#     # mean for @01, @03, @05, @10
#     (0.5, 0.6666666865348816, 0.6416666507720947, 0.5325397253036499),
#     # std for @01, @03, @05, @10
#     (0.0, 0.0, 0.0, 0.0)
# )
import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_users, num_features, num_items = int(1e4), int(1e1), 10
X = torch.rand(num_users, num_features)
y = (torch.rand(num_users, num_items) > 0.5).to(torch.float32)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_items)
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    num_epochs=3,
    verbose=True,
    callbacks=[
        dl.BatchTransformCallback(
            transform=torch.sigmoid,
            scope="on_batch_end",
            input_key="logits",
            output_key="scores"
        ),
        dl.CriterionCallback(
            input_key="logits", target_key="targets", metric_key="loss"
        ),
        dl.AUCCallback(input_key="scores", target_key="targets"),
        dl.HitrateCallback(
            input_key="scores", target_key="targets", topk_args=(1, 3, 5)
        ),
        dl.MRRCallback(input_key="scores", target_key="targets", topk_args=(1, 3, 5)),
        dl.MAPCallback(input_key="scores", target_key="targets", topk_args=(1, 3, 5)),
        dl.NDCGCallback(input_key="scores", target_key="targets", topk_args=(1, 3, 5)),
        dl.OptimizerCallback(metric_key="loss"),
        dl.SchedulerCallback(),
        dl.CheckpointCallback(
            logdir="./logs", loader_key="valid", metric_key="loss", minimize=True
        ),
    ]
)

Note

Please follow the minimal examples sections for more use cases.

RecSys – MRRMetric

class catalyst.metrics._mrr.MRRMetric(topk_args: Optional[List[int]] = None, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.ICallbackBatchMetric

Calculates the Mean Reciprocal Rank (MRR) score given model outputs and targets The precision metric summarizes the fraction of relevant items Computes mean value of map and it’s approximate std value

Parameters
  • topk_args – list of topk for mrr@topk computing

  • compute_on_call – if True, computes and returns metric value during metric call

  • prefix – metric prefix

  • suffix – metric suffix

Examples:

import torch
from catalyst import metrics

outputs = torch.Tensor([
    [4.0, 2.0, 3.0, 1.0],
    [1.0, 2.0, 3.0, 4.0],
])
targets = torch.tensor([
    [0, 0, 1.0, 1.0],
    [0, 0, 1.0, 1.0],
])
metric = metrics.MRRMetric(topk_args=[1, 3])
metric.reset()

metric.update(outputs, targets)
metric.compute()
# ((0.5, 0.75), (0.0, 0.0))  # mean, std for @01, @03

metric.compute_key_value()
# {
#     'mrr01': 0.5,
#     'mrr03': 0.75,
#     'mrr': 0.5,
#     'mrr01/std': 0.0,
#     'mrr03/std': 0.0,
#     'mrr/std': 0.0
# }

metric.reset()
metric(outputs, targets)
# ((0.5, 0.75), (0.0, 0.0))  # mean, std for @01, @03
import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_users, num_features, num_items = int(1e4), int(1e1), 10
X = torch.rand(num_users, num_features)
y = (torch.rand(num_users, num_items) > 0.5).to(torch.float32)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_items)
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    num_epochs=3,
    verbose=True,
    callbacks=[
        dl.BatchTransformCallback(
            transform=torch.sigmoid,
            scope="on_batch_end",
            input_key="logits",
            output_key="scores"
        ),
        dl.CriterionCallback(
            input_key="logits", target_key="targets", metric_key="loss"
        ),
        dl.AUCCallback(input_key="scores", target_key="targets"),
        dl.HitrateCallback(
            input_key="scores", target_key="targets", topk_args=(1, 3, 5)
        ),
        dl.MRRCallback(input_key="scores", target_key="targets", topk_args=(1, 3, 5)),
        dl.MAPCallback(input_key="scores", target_key="targets", topk_args=(1, 3, 5)),
        dl.NDCGCallback(input_key="scores", target_key="targets", topk_args=(1, 3, 5)),
        dl.OptimizerCallback(metric_key="loss"),
        dl.SchedulerCallback(),
        dl.CheckpointCallback(
            logdir="./logs", loader_key="valid", metric_key="loss", minimize=True
        ),
    ]
)

Note

Please follow the minimal examples sections for more use cases.

RecSys – NDCGMetric

class catalyst.metrics._ndcg.NDCGMetric(topk_args: Optional[List[int]] = None, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.ICallbackBatchMetric

Calculates the Normalized discounted cumulative gain (NDCG) score given model outputs and targets The precision metric summarizes the fraction of relevant items Computes mean value of NDCG and it’s approximate std value

Parameters
  • topk_args – list of topk for ndcg@topk computing

  • compute_on_call – if True, computes and returns metric value during metric call

  • prefix – metric prefix

  • suffix – metric suffix

Examples:

import torch
from catalyst import metrics

outputs = torch.Tensor([
    [0.5, 0.2, 0.1],
    [0.5, 0.2, 0.1],
])
targets = torch.tensor([
    [1.0, 0.0, 1.0],
    [1.0, 0.0, 1.0],
])
metric = metrics.NDCGMetric(topk_args=[1, 2])
metric.reset()

metric.update(outputs, targets)
metric.compute()
# (
#     (1.0, 0.6131471991539001),  # mean for @01, @02
#     (0.0, 0.0)                  # std for @01, @02
# )

metric.compute_key_value()
# {
#     'ndcg01': 1.0,
#     'ndcg02': 0.6131471991539001,
#     'ndcg': 1.0,
#     'ndcg01/std': 0.0,
#     'ndcg02/std': 0.0,
#     'ndcg/std': 0.0
# }

metric.reset()
metric(outputs, targets)
# (
#     (1.0, 0.6131471991539001),  # mean for @01, @02
#     (0.0, 0.0)                  # std for @01, @02
# )
# ((0.5, 0.75), (0.0, 0.0))  # mean, std for @01, @03
import torch
from torch.utils.data import DataLoader, TensorDataset
from catalyst import dl

# sample data
num_users, num_features, num_items = int(1e4), int(1e1), 10
X = torch.rand(num_users, num_features)
y = (torch.rand(num_users, num_items) > 0.5).to(torch.float32)

# pytorch loaders
dataset = TensorDataset(X, y)
loader = DataLoader(dataset, batch_size=32, num_workers=1)
loaders = {"train": loader, "valid": loader}

# model, criterion, optimizer, scheduler
model = torch.nn.Linear(num_features, num_items)
criterion = torch.nn.BCEWithLogitsLoss()
optimizer = torch.optim.Adam(model.parameters())
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, [2])

# model training
runner = dl.SupervisedRunner(
    input_key="features", output_key="logits", target_key="targets", loss_key="loss"
)
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    num_epochs=3,
    verbose=True,
    callbacks=[
        dl.BatchTransformCallback(
            transform=torch.sigmoid,
            scope="on_batch_end",
            input_key="logits",
            output_key="scores"
        ),
        dl.CriterionCallback(
            input_key="logits", target_key="targets", metric_key="loss"
        ),
        dl.AUCCallback(input_key="scores", target_key="targets"),
        dl.HitrateCallback(input_key="scores", target_key="targets", topk_args=(1, 3, 5)),
        dl.HitrateCallback(
            input_key="scores", target_key="targets", topk_args=(1, 3, 5)
        ),
        dl.MAPCallback(input_key="scores", target_key="targets", topk_args=(1, 3, 5)),
        dl.NDCGCallback(input_key="scores", target_key="targets", topk_args=(1, 3, 5)),
        dl.OptimizerCallback(metric_key="loss"),
        dl.SchedulerCallback(),
        dl.CheckpointCallback(
            logdir="./logs", loader_key="valid", metric_key="loss", minimize=True
        ),
    ]
)

Note

Please follow the minimal examples sections for more use cases.

Segmentation – RegionBasedMetric

class catalyst.metrics._segmentation.RegionBasedMetric(metric_fn: Callable, metric_name: str, class_dim: int = 1, weights: Optional[List[float]] = None, class_names: Optional[List[str]] = None, threshold: Optional[float] = 0.5, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._metric.ICallbackBatchMetric

Logic class for all region based metrics, like IoU, Dice, Trevsky.

Parameters
  • metric_fn – metric function, that get statistics and return score

  • metric_name – name of the metric

  • class_dim – indicates class dimension (K) for outputs and targets tensors (default = 1)

  • weights – class weights

  • class_names – class names

  • threshold – threshold for outputs binarization

  • compute_on_call – Computes and returns metric value during metric call. Used for per-batch logging. default: True

  • prefix – metric prefix

  • suffix – metric suffix

Interface, please check out implementations for more details:

Segmentation – DiceMetric

class catalyst.metrics._segmentation.DiceMetric(class_dim: int = 1, weights: Optional[List[float]] = None, class_names: Optional[List[str]] = None, threshold: Optional[float] = None, eps: float = 1e-07, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._segmentation.RegionBasedMetric

Dice Metric, dice score = 2 * intersection / (intersection + union)) = 2 * tp / (2 * tp + fp + fn)

Parameters
  • class_dim – indicates class dimention (K) for outputs and

  • tensors (targets) –

  • weights – class weights

  • class_names – class names

  • threshold – threshold for outputs binarization

  • eps – epsilon to avoid zero division

  • compute_on_call – Computes and returns metric value during metric call. Used for per-batch logging. default: True

  • prefix – metric prefix

  • suffix – metric suffix

Examples:

import torch
from catalyst import metrics

outputs = torch.tensor([[[[0.8, 0.1, 0], [0, 0.4, 0.3], [0, 0, 1]]]])
targets = torch.tensor([[[[1.0, 0, 0], [0, 1, 0], [1, 1, 0]]]])
metric = metrics.DiceMetric()
metric.reset()

metric.compute()
# per_class, micro, macro, weighted
# ([tensor(0.3636)], tensor(0.3636), tensor(0.3636), None)

metric.update_key_value(outputs, targets)
metric.compute_key_value()
# {
#     'dice': tensor(0.3636),
#     'dice/_macro': tensor(0.3636),
#     'dice/_micro': tensor(0.3636),
#     'dice/class_00': tensor(0.3636),
# }
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from catalyst import dl
from catalyst.data import ToTensor
from catalyst.contrib.datasets import MNIST
from catalyst.contrib.nn import IoULoss


model = nn.Sequential(
    nn.Conv2d(1, 1, 3, 1, 1), nn.ReLU(),
    nn.Conv2d(1, 1, 3, 1, 1), nn.Sigmoid(),
)
criterion = IoULoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(
        MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()),
        batch_size=32
    ),
    "valid": DataLoader(
        MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()),
        batch_size=32
    ),
}

class CustomRunner(dl.SupervisedRunner):
    def handle_batch(self, batch):
        x = batch[self._input_key]
        x_noise = (x + torch.rand_like(x)).clamp_(0, 1)
        x_ = self.model(x_noise)
        self.batch = {self._input_key: x, self._output_key: x_, self._target_key: x}

runner = CustomRunner(
    input_key="features", output_key="scores", target_key="targets", loss_key="loss"
)
# model training
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    loaders=loaders,
    num_epochs=1,
    callbacks=[
        dl.IOUCallback(input_key="scores", target_key="targets"),
        dl.DiceCallback(input_key="scores", target_key="targets"),
        dl.TrevskyCallback(input_key="scores", target_key="targets", alpha=0.2),
    ],
    logdir="./logdir",
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
    verbose=True,
)

Note

Please follow the minimal examples sections for more use cases.

Segmentation – IOUMetric

class catalyst.metrics._segmentation.IOUMetric(class_dim: int = 1, weights: Optional[List[float]] = None, class_names: Optional[List[str]] = None, threshold: Optional[float] = None, eps: float = 1e-07, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._segmentation.RegionBasedMetric

IoU Metric, iou score = intersection / union = tp / (tp + fp + fn).

Parameters
  • class_dim – indicates class dimension (K) for outputs and targets tensors (default = 1)

  • weights – class weights

  • class_names – class names

  • threshold – threshold for outputs binarization

  • eps – epsilon to avoid zero division

  • compute_on_call – Computes and returns metric value during metric call. Used for per-batch logging. default: True

  • prefix – metric prefix

  • suffix – metric suffix

Examples:

import torch
from catalyst import metrics

outputs = torch.tensor([[[[0.8, 0.1, 0], [0, 0.4, 0.3], [0, 0, 1]]]])
targets = torch.tensor([[[[1.0, 0, 0], [0, 1, 0], [1, 1, 0]]]])
metric = metrics.IOUMetric()
metric.reset()

metric.compute()
# per_class, micro, macro, weighted
# ([tensor(0.2222)], tensor(0.2222), tensor(0.2222), None)

metric.update_key_value(outputs, targets)
metric.compute_key_value()
# {
#     'iou': tensor(0.2222),
#     'iou/_macro': tensor(0.2222),
#     'iou/_micro': tensor(0.2222),
#     'iou/class_00': tensor(0.2222),
# }
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from catalyst import dl
from catalyst.data import ToTensor
from catalyst.contrib.datasets import MNIST
from catalyst.contrib.nn import IoULoss


model = nn.Sequential(
    nn.Conv2d(1, 1, 3, 1, 1), nn.ReLU(),
    nn.Conv2d(1, 1, 3, 1, 1), nn.Sigmoid(),
)
criterion = IoULoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(
        MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()),
        batch_size=32
    ),
    "valid": DataLoader(
        MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()),
        batch_size=32
    ),
}

class CustomRunner(dl.SupervisedRunner):
    def handle_batch(self, batch):
        x = batch[self._input_key]
        x_noise = (x + torch.rand_like(x)).clamp_(0, 1)
        x_ = self.model(x_noise)
        self.batch = {self._input_key: x, self._output_key: x_, self._target_key: x}

runner = CustomRunner(
    input_key="features", output_key="scores", target_key="targets", loss_key="loss"
)
# model training
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    loaders=loaders,
    num_epochs=1,
    callbacks=[
        dl.IOUCallback(input_key="scores", target_key="targets"),
        dl.DiceCallback(input_key="scores", target_key="targets"),
        dl.TrevskyCallback(input_key="scores", target_key="targets", alpha=0.2),
    ],
    logdir="./logdir",
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
    verbose=True,
)

Note

Please follow the minimal examples sections for more use cases.

Segmentation – TrevskyMetric

class catalyst.metrics._segmentation.TrevskyMetric(alpha: float, beta: Optional[float] = None, class_dim: int = 1, weights: Optional[List[float]] = None, class_names: Optional[List[str]] = None, threshold: Optional[float] = None, eps: float = 1e-07, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]

Bases: catalyst.metrics._segmentation.RegionBasedMetric

Trevsky Metric, trevsky score = tp / (tp + fp * beta + fn * alpha)

Parameters
  • alpha – false negative coefficient, bigger alpha bigger penalty for false negative. if beta is None, alpha must be in (0, 1)

  • beta – false positive coefficient, bigger alpha bigger penalty for false positive. Must be in (0, 1), if None beta = (1 - alpha)

  • class_dim – indicates class dimension (K) for outputs and targets tensors (default = 1)

  • weights – class weights

  • class_names – class names

  • threshold – threshold for outputs binarization

  • eps – epsilon to avoid zero division

  • compute_on_call – Computes and returns metric value during metric call. Used for per-batch logging. default: True

  • prefix – metric prefix

  • suffix – metric suffix

Examples:

import torch
from catalyst import metrics

outputs = torch.tensor([[[[0.8, 0.1, 0], [0, 0.4, 0.3], [0, 0, 1]]]])
targets = torch.tensor([[[[1.0, 0, 0], [0, 1, 0], [1, 1, 0]]]])
metric = metrics.TrevskyMetric(alpha=0.2)
metric.reset()

metric.compute()
# per_class, micro, macro, weighted
# ([tensor(0.4167)], tensor(0.4167), tensor(0.4167), None)

metric.update_key_value(outputs, targets)
metric.compute_key_value()
# {
#     'trevsky': tensor(0.4167),
#     'trevsky/_macro': tensor(0.4167)
#     'trevsky/_micro': tensor(0.4167),
#     'trevsky/class_00': tensor(0.4167),
# }
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from catalyst import dl
from catalyst.data import ToTensor
from catalyst.contrib.datasets import MNIST
from catalyst.contrib.nn import IoULoss


model = nn.Sequential(
    nn.Conv2d(1, 1, 3, 1, 1), nn.ReLU(),
    nn.Conv2d(1, 1, 3, 1, 1), nn.Sigmoid(),
)
criterion = IoULoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.02)

loaders = {
    "train": DataLoader(
        MNIST(os.getcwd(), train=True, download=True, transform=ToTensor()),
        batch_size=32
    ),
    "valid": DataLoader(
        MNIST(os.getcwd(), train=False, download=True, transform=ToTensor()),
        batch_size=32
    ),
}

class CustomRunner(dl.SupervisedRunner):
    def handle_batch(self, batch):
        x = batch[self._input_key]
        x_noise = (x + torch.rand_like(x)).clamp_(0, 1)
        x_ = self.model(x_noise)
        self.batch = {self._input_key: x, self._output_key: x_, self._target_key: x}

runner = CustomRunner(
    input_key="features", output_key="scores", target_key="targets", loss_key="loss"
)
# model training
runner.train(
    model=model,
    criterion=criterion,
    optimizer=optimizer,
    loaders=loaders,
    num_epochs=1,
    callbacks=[
        dl.IOUCallback(input_key="scores", target_key="targets"),
        dl.DiceCallback(input_key="scores", target_key="targets"),
        dl.TrevskyCallback(input_key="scores", target_key="targets", alpha=0.2),
    ],
    logdir="./logdir",
    valid_loader="valid",
    valid_metric="loss",
    minimize_valid_metric=True,
    verbose=True,
)

Note

Please follow the minimal examples sections for more use cases.

Functional API

Accuracy

catalyst.metrics.functional._accuracy.accuracy(outputs: torch.Tensor, targets: torch.Tensor, topk: Sequence[int] = (1))Sequence[torch.Tensor][source]

Computes multiclass accuracy@topk for the specified values of topk.

Parameters
  • outputs – model outputs, logits with shape [bs; num_classes]

  • targets – ground truth, labels with shape [bs; 1]

  • topktopk for accuracy@topk computing

Returns

list with computed accuracy@topk

Examples:

import torch
from catalyst import metrics
metrics.accuracy(
    outputs=torch.tensor([
        [1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
    ]),
    targets=torch.tensor([0, 1, 2]),
)
# [tensor([1.])]
import torch
from catalyst import metrics
metrics.accuracy(
    outputs=torch.tensor([
        [1, 0, 0],
        [0, 1, 0],
        [0, 1, 0],
    ]),
    targets=torch.tensor([0, 1, 2]),
)
# [tensor([0.6667])]
import torch
from catalyst import metrics
metrics.accuracy(
    outputs=torch.tensor([
        [1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
    ]),
    targets=torch.tensor([0, 1, 2]),
    topk=[1, 3],
)
# [tensor([1.]), tensor([1.])]
import torch
from catalyst import metrics
metrics.accuracy(
    outputs=torch.tensor([
        [1, 0, 0],
        [0, 1, 0],
        [0, 1, 0],
    ]),
    targets=torch.tensor([0, 1, 2]),
    topk=[1, 3],
)
# [tensor([0.6667]), tensor([1.])]
catalyst.metrics.functional._accuracy.multilabel_accuracy(outputs: torch.Tensor, targets: torch.Tensor, threshold: Union[float, torch.Tensor])torch.Tensor[source]

Computes multilabel accuracy for the specified activation and threshold.

Parameters
  • outputs – NxK tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.

  • targets – binary NxK tensort that encodes which of the K classes are associated with the N-th input (eg: a row [0, 1, 0, 1] indicates that the example is associated with classes 2 and 4)

  • threshold – threshold for for model output

Returns

computed multilabel accuracy

Examples:

import torch
from catalyst import metrics
metrics.multilabel_accuracy(
    outputs=torch.tensor([
        [1, 0],
        [0, 1],
    ]),
    targets=torch.tensor([
        [1, 0],
        [0, 1],
    ]),
    threshold=0.5,
)
# tensor([1.])
import torch
from catalyst import metrics
metrics.multilabel_accuracy(
    outputs=torch.tensor([
        [1.0, 0.0],
        [0.6, 1.0],
    ]),
    targets=torch.tensor([
        [1, 0],
        [0, 1],
    ]),
    threshold=0.5,
)
# tensor(0.7500)
import torch
from catalyst import metrics
metrics.multilabel_accuracy(
    outputs=torch.tensor([
        [1.0, 0.0],
        [0.4, 1.0],
    ]),
    targets=torch.tensor([
        [1, 0],
        [0, 1],
    ]),
    threshold=0.5,
)
# tensor(1.0)

AUC

catalyst.metrics.functional._auc.auc(scores: torch.Tensor, targets: torch.Tensor)torch.Tensor[source]

Computes ROC-AUC.

Parameters
  • scores – NxK tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.

  • targets – binary NxK tensort that encodes which of the K classes are associated with the N-th input (eg: a row [0, 1, 0, 1] indicates that the example is associated with classes 2 and 4)

Returns

Tensor with [num_classes] shape of per-class-aucs

Return type

torch.Tensor

Examples:

import torch
from catalyst import metrics
metrics.auc(
    scores=torch.tensor([
        [0.9, 0.1],
        [0.1, 0.9],
    ]),
    targets=torch.tensor([
        [1, 0],
        [0, 1],
    ]),
)
# tensor([1., 1.])
from catalyst import metrics
metrics.auc(
    scores=torch.tensor([
        [0.9],
        [0.8],
        [0.7],
        [0.6],
        [0.5],
        [0.4],
        [0.3],
        [0.2],
        [0.1],
        [0.0],
    ]),
    targets=torch.tensor([
        [0],
        [1],
        [1],
        [1],
        [1],
        [1],
        [1],
        [0],
        [0],
        [0],
    ]),
)
# tensor([0.7500])

Warning

This metric is under API improvement.

catalyst.metrics.functional._auc.binary_auc(scores: torch.Tensor, targets: torch.Tensor)Tuple[float, numpy.ndarray, numpy.ndarray][source]

Binary AUC computation.

Parameters
  • scores – estimated scores from a model.

  • targets – ground truth (correct) target values.

Returns

measured roc-auc, true positive rate, false positive rate

Return type

Tuple[float, np.ndarray, np.ndarray]

Warning

This metric is under API improvement.

Example:

import torch
from catalyst import metrics
metrics.binary_auc(
    scores=torch.tensor([
        0.9,
        0.8,
        0.7,
        0.6,
        0.5,
        0.4,
        0.3,
        0.2,
        0.1,
        0.0,
    ]),
    targets=torch.tensor([
        0,
        1,
        1,
        1,
        1,
        1,
        1,
        0,
        0,
        0,
    ]),
)
# 0.7500,
# [0.  , 0.  , 0.16, 0.33, 0.5 , 0.66, 0.83, 0.83, 1. , 1.  , 1.  ],
# [0.  , 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.5, 0.75, 1.  ]

Average Precision

catalyst.metrics.functional._average_precision.average_precision(outputs: torch.Tensor, targets: torch.Tensor, k: int)torch.Tensor[source]

Calculate the Average Precision for RecSys. The precision metric summarizes the fraction of relevant items out of the whole the recommendation list.

To compute the precision at k set the threshold rank k, compute the percentage of relevant items in topK, ignoring the documents ranked lower than k.

The average precision at k (AP at k) summarizes the average precision for relevant items up to the k-th one. Wikipedia entry for the Average precision

<https://en.wikipedia.org/w/index.php?title=Information_retrieval& oldid=793358396#Average_precision>

If a relevant document never gets retrieved, we assume the precision corresponding to that relevant doc to be zero

Parameters
  • outputs (torch.Tensor) – Tensor with predicted score size: [batch_size, slate_length] model outputs, logits

  • targets (torch.Tensor) – Binary tensor with ground truth. 1 means the item is relevant and 0 not relevant size: [batch_szie, slate_length] ground truth, labels

  • k – Parameter for evaluation on top-k items

Returns

The map score for each batch. size: [batch_size, 1]

Return type

ap_score (torch.Tensor)

Example:

import torch
from catalyst import metrics
metrics.average_precision(
    outputs=torch.tensor([
        [9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
        [9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
    ]),
    targets=torch.tensor([
        [1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0],
        [0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0],
    ]),
    k=10,
)
# tensor([0.6222, 0.4429])
catalyst.metrics.functional._average_precision.binary_average_precision(outputs: torch.Tensor, targets: torch.Tensor, weights: Optional[torch.Tensor] = None)torch.Tensor[source]

Computes the binary average precision.

Parameters
  • outputs – NxK tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.

  • targets – binary NxK tensort that encodes which of the K classes are associated with the N-th input (eg: a row [0, 1, 0, 1] indicates that the example is associated with classes 2 and 4)

  • weights – importance for each sample

Returns

tensor of [K; ] shape, with average precision for K classes

Return type

torch.Tensor

Example:

import torch
from catalyst import metrics
metrics.binary_average_precision(
    outputs=torch.Tensor([0.1, 0.4, 0.35, 0.8]),
    targets=torch.Tensor([0, 0, 1, 1]),
)
# tensor([0.8333])
catalyst.metrics.functional._average_precision.mean_average_precision(outputs: torch.Tensor, targets: torch.Tensor, topk: List[int])List[torch.Tensor][source]

Calculate the mean average precision (MAP) for RecSys. The metrics calculate the mean of the AP across all batches

MAP amplifies the interest in finding many relevant items for each query

Parameters
  • outputs (torch.Tensor) – Tensor with predicted score size: [batch_size, slate_length] model outputs, logits

  • targets (torch.Tensor) – Binary tensor with ground truth. 1 means the item is relevant and 0 not relevant size: [batch_szie, slate_length] ground truth, labels

  • topk (List[int]) – List of parameter for evaluation topK items

Returns

The map score for every k. size: len(top_k)

Return type

map_at_k (Tuple[float])

Example:

import torch
from catalyst import metrics
metrics.mean_average_precision(
    outputs=torch.tensor([
        [9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
        [9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
    ]),
    targets=torch.tensor([
        [1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0],
        [0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0],
    ]),
    topk=[1, 3, 5, 10],
)
# [tensor(0.5000), tensor(0.6667), tensor(0.6417), tensor(0.5325)]

Classification

catalyst.metrics.functional._classification.f1score(precision_value, recall_value, eps=1e-05)[source]

Calculating F1-score from precision and recall to reduce computation redundancy.

Parameters
  • precision_value – precision (0-1)

  • recall_value – recall (0-1)

  • eps – epsilon to use

Returns

F1 score (0-1)

catalyst.metrics.functional._classification.get_aggregated_metrics(tp: numpy.array, fp: numpy.array, fn: numpy.array, support: numpy.array, zero_division: int = 0)Tuple[numpy.array, numpy.array, numpy.array, numpy.array][source]

Count precision, recall, f1 scores per-class and with macro, weighted and micro average with statistics.

Parameters
  • tp – array of shape (num_classes, ) of true positive statistics per class

  • fp – array of shape (num_classes, ) of false positive statistics per class

  • fn – array of shape (num_classes, ) of false negative statistics per class

  • support – array of shape (num_classes, ) of samples count per class

  • zero_division – int value, should be one of 0 or 1; used for precision and recall computation

Returns

per-class, micro, macro, weighted averaging

Return type

arrays of metrics

catalyst.metrics.functional._classification.get_binary_metrics(tp: int, fp: int, fn: int, zero_division: int)Tuple[float, float, float][source]
Get precision, recall, f1 score metrics from true positive, false positive,

false negative statistics for binary classification

Parameters
  • tp – true positive

  • fp – false positive

  • fn – false negative

  • zero_division – int value, should be 0 or 1

Returns

precision, recall, f1 scores

catalyst.metrics.functional._classification.precision(tp: int, fp: int, zero_division: int = 0)float[source]

Calculates precision (a.k.a. positive predictive value) for binary classification and segmentation.

Parameters
  • tp – number of true positives

  • fp – number of false positives

  • zero_division – int value, should be one of 0 or 1; if both tp==0 and fp==0 return this value as s result

Returns

precision value (0-1)

catalyst.metrics.functional._classification.precision_recall_fbeta_support(outputs: torch.Tensor, targets: torch.Tensor, beta: float = 1, eps: float = 1e-06, argmax_dim: int = - 1, num_classes: Optional[int] = None, zero_division: int = 0)Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]

Counts precision_val, recall, fbeta_score.

Parameters
  • outputs – A list of predicted elements

  • targets – A list of elements that are to be predicted

  • beta – beta param for f_score

  • eps – epsilon to avoid zero division

  • argmax_dim – int, that specifies dimension for argmax transformation in case of scores/probabilities in outputs

  • num_classes – int, that specifies number of classes if it known.

  • zero_division – int value, should be one of 0 or 1; used for precision_val and recall computation

Returns

tuple of precision_val, recall, fbeta_score

Examples:

import torch
from catalyst import metrics
metrics.precision_recall_fbeta_support(
    outputs=torch.tensor([
        [1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
    ]),
    targets=torch.tensor([0, 1, 2]),
    beta=1,
)
# (
#     tensor([1., 1., 1.]),  # per class precision
#     tensor([1., 1., 1.]),  # per class recall
#     tensor([1., 1., 1.]),  # per class fbeta
#     tensor([1., 1., 1.]),  # per class support
# )
import torch
from catalyst import metrics
metrics.precision_recall_fbeta_support(
    outputs=torch.tensor([[0, 0, 1, 1, 0, 1, 0, 1]]),
    targets=torch.tensor([[0, 1, 0, 1, 0, 0, 1, 1]]),
    beta=1,
)
# (
#     tensor([0.5000, 0.5000]),  # per class precision
#     tensor([0.5000, 0.5000]),  # per class recall
#     tensor([0.5000, 0.5000]),  # per class fbeta
#     tensor([4., 4.]),          # per class support
# )
catalyst.metrics.functional._classification.recall(tp: int, fn: int, zero_division: int = 0)float[source]

Calculates recall (a.k.a. true positive rate) for binary classification and segmentation.

Parameters
  • tp – number of true positives

  • fn – number of false negatives

  • zero_division – int value, should be one of 0 or 1; if both tp==0 and fn==0 return this value as s result

Returns

recall value (0-1)

CMC Score

catalyst.metrics.functional._cmc_score.cmc_score(query_embeddings: torch.Tensor, gallery_embeddings: torch.Tensor, conformity_matrix: torch.Tensor, topk: int = 1)float[source]

Function to count CMC score from query and gallery embeddings.

Parameters
  • query_embeddings – tensor shape of (n_embeddings, embedding_dim) embeddings of the objects in query

  • gallery_embeddings – tensor shape of (n_embeddings, embedding_dim) embeddings of the objects in gallery

  • conformity_matrix – binary matrix with 1 on same label pos and 0 otherwise

  • topk – number of top examples for cumulative score counting

Returns

cmc score

Example:

import torch
from catalyst import metrics
metrics.cmc_score(
    query_embeddings=torch.tensor([
        [1, 1, 0, 0], [1, 0, 0, 0], [0, 1, 1, 1], [0, 0, 1, 1],
    ]).float(),
    gallery_embeddings=torch.tensor([
        [1, 1, 1, 0], [1, 1, 1, 1], [0, 1, 1, 0],
    ]).float(),
    conformity_matrix=torch.tensor([
        [True, False, False],
        [True, False, False],
        [False, True, True],
        [False, True, True],
    ]),
    topk=1,
)
# 1.0
catalyst.metrics.functional._cmc_score.cmc_score_count(distances: torch.Tensor, conformity_matrix: torch.Tensor, topk: int = 1)float[source]

Function to count CMC from distance matrix and conformity matrix.

Parameters
  • distances – distance matrix shape of (n_embeddings_x, n_embeddings_y)

  • conformity_matrix – binary matrix with 1 on same label pos and 0 otherwise

  • topk – number of top examples for cumulative score counting

Returns

cmc score

Examples:

import torch
from catalyst import metrics
metrics.cmc_score_count(
    distances=torch.tensor([[1, 2], [2, 1]]),
    conformity_matrix=torch.tensor([[0, 1], [1, 0]]),
    topk=1,
)
# 0.0
import torch
from catalyst import metrics
metrics.cmc_score_count(
    distances=torch.tensor([[1, 0.5, 0.2], [2, 3, 4], [0.4, 3, 4]]),
    conformity_matrix=torch.tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]]),
    topk=2,
)
# 0.33
catalyst.metrics.functional._cmc_score.masked_cmc_score(query_embeddings: torch.Tensor, gallery_embeddings: torch.Tensor, conformity_matrix: torch.Tensor, available_samples: torch.Tensor, topk: int = 1)float[source]
Parameters
  • query_embeddings – tensor shape of (n_embeddings, embedding_dim) embeddings of the objects in query

  • gallery_embeddings – tensor shape of (n_embeddings, embedding_dim) embeddings of the objects in gallery

  • conformity_matrix – binary matrix with 1 on same label pos and 0 otherwise

  • available_samples – tensor of shape (query_size, gallery_size), available_samples[i][j] == 1 means that j-th element of gallery should be used while scoring i-th query one

  • topk – number of top examples for cumulative score counting

Returns

cmc score with mask

Raises

ValueError – if there are items that have different labels and are unavailable for each other according to availability matrix

Example:

import torch
from catalyst import metrics
metrics.masked_cmc_score(
    query_embeddings=torch.tensor([
        [1, 1, 0, 0], [1, 0, 0, 0], [0, 1, 1, 1], [0, 0, 1, 1],
    ]).float(),
    gallery_embeddings=torch.tensor([
        [1, 1, 1, 0], [1, 1, 1, 1], [0, 1, 1, 0],
    ]).float(),
    conformity_matrix=torch.tensor([
        [True, False, False],
        [True, False, False],
        [False, True, True],
        [False, True, True],
    ]),
    available_samples=torch.tensor([
        [False, True, True],
        [True, True, True],
        [True, False, True],
        [True, True, True],
    ]),
    topk=1,
)
# 0.75

F1 score

catalyst.metrics.functional._f1_score.f1_score(outputs: torch.Tensor, targets: torch.Tensor, eps: float = 1e-07, argmax_dim: int = - 1, num_classes: Optional[int] = None)Union[float, torch.Tensor][source]

Fbeta_score with beta=1.

Parameters
  • outputs – A list of predicted elements

  • targets – A list of elements that are to be predicted

  • eps – epsilon to avoid zero division

  • argmax_dim – int, that specifies dimension for argmax transformation in case of scores/probabilities in outputs

  • num_classes – int, that specifies number of classes if it known

Returns

F_1 score

Return type

float

Example:

import torch
from catalyst import metrics
metrics.f1_score(
    outputs=torch.tensor([
        [1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
    ]),
    targets=torch.tensor([0, 1, 2]),
)
# tensor([1., 1., 1.]),  # per class fbeta
catalyst.metrics.functional._f1_score.fbeta_score(outputs: torch.Tensor, targets: torch.Tensor, beta: float = 1.0, eps: float = 1e-07, argmax_dim: int = - 1, num_classes: Optional[int] = None)Union[float, torch.Tensor][source]

Counts fbeta score for given outputs and targets.

Parameters
  • outputs – A list of predicted elements

  • targets – A list of elements that are to be predicted

  • beta – beta param for f_score

  • eps – epsilon to avoid zero division

  • argmax_dim – int, that specifies dimension for argmax transformation in case of scores/probabilities in outputs

  • num_classes – int, that specifies number of classes if it known

Raises

ValueError – If beta is a negative number.

Returns

F_beta score.

Return type

float

Example:

import torch
from catalyst import metrics
metrics.fbeta_score(
    outputs=torch.tensor([
        [1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
    ]),
    targets=torch.tensor([0, 1, 2]),
    beta=1,
)
# tensor([1., 1., 1.]),  # per class fbeta

Focal

catalyst.metrics.functional._focal.reduced_focal_loss(outputs: torch.Tensor, targets: torch.Tensor, threshold: float = 0.5, gamma: float = 2.0, reduction='mean')torch.Tensor[source]

Compute reduced focal loss between target and output logits.

It has been proposed in Reduced Focal Loss: 1st Place Solution to xView object detection in Satellite Imagery paper.

Note

size_average and reduce params are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction.

Source: https://github.com/BloodAxe/pytorch-toolbelt

Parameters
  • outputs – tensor of arbitrary shape

  • targets – tensor of the same shape as input

  • threshold – threshold for focal reduction

  • gamma – gamma for focal reduction

  • reduction – specifies the reduction to apply to the output: "none" | "mean" | "sum" | "batchwise_mean". "none": no reduction will be applied, "mean": the sum of the output will be divided by the number of elements in the output, "sum": the output will be summed. "batchwise_mean" computes mean loss per sample in batch. Default: “mean”

Returns: # noqa: DAR201

torch.Tensor: computed loss

catalyst.metrics.functional._focal.sigmoid_focal_loss(outputs: torch.Tensor, targets: torch.Tensor, gamma: float = 2.0, alpha: float = 0.25, reduction: str = 'mean')[source]

Compute binary focal loss between target and output logits.

Parameters
  • outputs – tensor of arbitrary shape

  • targets – tensor of the same shape as input

  • gamma – gamma for focal loss

  • alpha – alpha for focal loss

  • reduction (string, optional) – specifies the reduction to apply to the output: "none" | "mean" | "sum" | "batchwise_mean". "none": no reduction will be applied, "mean": the sum of the output will be divided by the number of elements in the output, "sum": the output will be summed.

Returns

computed loss

Source: https://github.com/BloodAxe/pytorch-toolbelt

Hitrate

catalyst.metrics.functional._hitrate.hitrate(outputs: torch.Tensor, targets: torch.Tensor, topk: List[int], zero_division: int = 0)List[torch.Tensor][source]

Calculate the hit rate (aka recall) score given model outputs and targets. Hit-rate is a metric for evaluating ranking systems. Generate top-N recommendations and if one of the recommendation is actually what user has rated, you consider that a hit. By rate we mean any explicit form of user’s interactions. Add up all of the hits for all users and then divide by number of users

Compute top-N recommendation for each user in the training stage and intentionally remove one of this items fro the training data.

Parameters
  • outputs (torch.Tensor) – Tensor with predicted score size: [batch_size, slate_length] model outputs, logits

  • targets (torch.Tensor) – Binary tensor with ground truth. 1 means the item is relevant for the user and 0 not relevant size: [batch_size, slate_length] ground truth, labels

  • topk (List[int]) – Parameter fro evaluation on top-k items

  • zero_division (int) – value, returns in the case of the divison by zero should be one of 0 or 1

Returns

the hitrate score

Return type

hitrate_at_k (List[torch.Tensor])

Example:

import torch
from catalyst import metrics
metrics.hitrate(
    outputs=torch.Tensor([[4.0, 2.0, 3.0, 1.0], [1.0, 2.0, 3.0, 4.0]]),
    targets=torch.Tensor([[0, 0, 1.0, 1.0], [0, 0, 0.0, 0.0]]),
    topk=[1, 2, 3, 4],
)
# [tensor(0.), tensor(0.2500), tensor(0.2500), tensor(0.5000)]

MRR

catalyst.metrics.functional._mrr.mrr(outputs: torch.Tensor, targets: torch.Tensor, topk: List[int])List[torch.Tensor][source]

Calculate the Mean Reciprocal Rank (MRR) score given model outputs and targets Data aggregated in batches.

The MRR@k is the mean overall batch of the reciprocal rank, that is the rank of the highest ranked relevant item, if any in the top k, 0 otherwise. https://en.wikipedia.org/wiki/Mean_reciprocal_rank

Parameters
  • outputs – Tensor weith predicted score size: [batch_size, slate_length] model outputs, logits

  • targets – Binary tensor with ground truth. 1 means the item is relevant and 0 if it’s not relevant size: [batch_szie, slate_length] ground truth, labels

  • topk – Parameter fro evaluation on top-k items

Returns

MRR score

Example:

import torch
from catalyst import metrics
metrics.mrr(
    outputs=torch.Tensor([
        [4.0, 2.0, 3.0, 1.0],
        [1.0, 2.0, 3.0, 4.0],
    ]),
    targets=torch.Tensor([
        [0, 0, 1.0, 1.0],
        [0, 0, 1.0, 1.0],
    ]),
    topk=[1, 3],
)
# [tensor(0.5000), tensor(0.7500)]
catalyst.metrics.functional._mrr.reciprocal_rank(outputs: torch.Tensor, targets: torch.Tensor, k: int)torch.Tensor[source]

Calculate the Reciprocal Rank (MRR) score given model outputs and targets Data aggregated in batches.

Parameters
  • outputs – Tensor weith predicted score size: [batch_size, slate_length] model outputs, logits

  • targets – Binary tensor with ground truth. 1 means the item is relevant and 0 if it’s not relevant size: [batch_size, slate_length] ground truth, labels

  • k – Parameter for evaluation on top-k items

Returns

MRR score

Examples:

import torch
from catalyst import metrics
metrics.reciprocal_rank(
    outputs=torch.Tensor([
        [4.0, 2.0, 3.0, 1.0],
        [1.0, 2.0, 3.0, 4.0],
    ]),
    targets=torch.Tensor([
        [0, 0, 1.0, 1.0],
        [0, 0, 1.0, 1.0],
    ]),
    k=1,
)
# tensor([[0.], [1.]])
import torch
from catalyst import metrics
metrics.reciprocal_rank(
    outputs=torch.Tensor([
        [4.0, 2.0, 3.0, 1.0],
        [1.0, 2.0, 3.0, 4.0],
    ]),
    targets=torch.Tensor([
        [0, 0, 1.0, 1.0],
        [0, 0, 1.0, 1.0],
    ]),
    k=3,
)
# tensor([[0.5000], [1.0000]])

NDCG

catalyst.metrics.functional._ndcg.dcg(outputs: torch.Tensor, targets: torch.Tensor, gain_function='exp_rank')torch.Tensor[source]

Computes Discounted cumulative gain (DCG) DCG@topk for the specified values of k. Graded relevance as a measure of usefulness, or gain, from examining a set of items. Gain may be reduced at lower ranks. Reference: https://en.wikipedia.org/wiki/Discounted_cumulative_gain

Parameters
  • outputs – model outputs, logits with shape [batch_size; slate_length]

  • targets – ground truth, labels with shape [batch_size; slate_length]

  • gain_function – String indicates the gain function for the ground truth labels. Two options available: - exp_rank: torch.pow(2, x) - 1 - linear_rank: x On the default, exp_rank is used to emphasize on retrieving the relevant documents.

Returns

The discounted gains tensor

Return type

dcg_score (torch.Tensor)

Raises

ValueError – gain function can be either pow_rank or rank

Examples:

from catalyst import metrics
metrics.dcg(
    outputs = torch.tensor([
        [3, 2, 1, 0],
    ]),
    targets = torch.Tensor([
        [2.0, 2.0, 1.0, 0.0],
    ]),
    gain_function="linear_rank",
)
# tensor([[2.0000, 2.0000, 0.6309, 0.0000]])
from catalyst import metrics
metrics.dcg(
    outputs = torch.tensor([
        [3, 2, 1, 0],
    ]),
    targets = torch.Tensor([
        [2.0, 2.0, 1.0, 0.0],
    ]),
    gain_function="linear_rank",
).sum()
# tensor(4.6309)
from catalyst import metrics
metrics.dcg(
    outputs = torch.tensor([
        [3, 2, 1, 0],
    ]),
    targets = torch.Tensor([
        [2.0, 2.0, 1.0, 0.0],
    ]),
    gain_function="exp_rank",
)
# tensor([[3.0000, 1.8928, 0.5000, 0.0000]])
from catalyst import metrics
metrics.dcg(
    outputs = torch.tensor([
        [3, 2, 1, 0],
    ]),
    targets = torch.Tensor([
        [2.0, 2.0, 1.0, 0.0],
    ]),
    gain_function="exp_rank",
).sum()
# tensor(5.3928)
catalyst.metrics.functional._ndcg.ndcg(outputs: torch.Tensor, targets: torch.Tensor, topk: List[int], gain_function='exp_rank')List[torch.Tensor][source]

Computes nDCG@topk for the specified values of topk.

Parameters
  • outputs (torch.Tensor) – model outputs, logits with shape [batch_size; slate_size]

  • targets (torch.Tensor) – ground truth, labels with shape [batch_size; slate_size]

  • gain_function – callable, gain function for the ground truth labels. Two options available: - exp_rank: torch.pow(2, x) - 1 - linear_rank: x On the default, exp_rank is used to emphasize on retrieving the relevant documents.

  • topk (List[int]) – Parameter fro evaluation on top-k items

Returns

tuple with computed ndcg@topk

Return type

results (Tuple[float])

Examples:

import torch
from catalyst import metrics
metrics.ndcg(
    outputs = torch.tensor([
        [0.5, 0.2, 0.1],
        [0.5, 0.2, 0.1],
    ]),
    targets = torch.Tensor([
        [1.0, 0.0, 1.0],
        [1.0, 0.0, 1.0],
    ]),
    topk=[2],
    gain_function="exp_rank",
)
# [tensor(0.6131)]
import torch
from catalyst import metrics
metrics.ndcg(
    outputs = torch.tensor([
        [0.5, 0.2, 0.1],
        [0.5, 0.2, 0.1],
    ]),
    targets = torch.Tensor([
        [1.0, 0.0, 1.0],
        [1.0, 0.0, 1.0],
    ]),
    topk=[2],
    gain_function="exp_rank",
)
# [tensor(0.5000)]

Precision

catalyst.metrics.functional._precision.precision(outputs: torch.Tensor, targets: torch.Tensor, argmax_dim: int = - 1, eps: float = 1e-07, num_classes: Optional[int] = None)Union[float, torch.Tensor][source]

Multiclass precision score.

Parameters
  • outputs – estimated targets as predicted by a model with shape [bs; …, (num_classes or 1)]

  • targets – ground truth (correct) target values with shape [bs; …, 1]

  • argmax_dim – int, that specifies dimension for argmax transformation in case of scores/probabilities in outputs

  • eps – float. Epsilon to avoid zero division.

  • num_classes – int, that specifies number of classes if it known

Returns

precision for every class

Return type

Tensor

Examples:

import torch
from catalyst import metrics
metrics.precision(
    outputs=torch.tensor([
        [1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
    ]),
    targets=torch.tensor([0, 1, 2]),
)
# tensor([1., 1., 1.])
import torch
from catalyst import metrics
metrics.precision(
    outputs=torch.tensor([[0, 0, 1, 1, 0, 1, 0, 1]]),
    targets=torch.tensor([[0, 1, 0, 1, 0, 0, 1, 1]]),
)
# tensor([0.5000, 0.5000]

Recall

catalyst.metrics.functional._recall.recall(outputs: torch.Tensor, targets: torch.Tensor, argmax_dim: int = - 1, eps: float = 1e-07, num_classes: Optional[int] = None)Union[float, torch.Tensor][source]

Multiclass recall score.

Parameters
  • outputs – estimated targets as predicted by a model with shape [bs; …, (num_classes or 1)]

  • targets – ground truth (correct) target values with shape [bs; …, 1]

  • argmax_dim – int, that specifies dimension for argmax transformation in case of scores/probabilities in outputs

  • eps – float. Epsilon to avoid zero division.

  • num_classes – int, that specifies number of classes if it known

Returns

recall for every class

Return type

Tensor

Examples:

import torch
from catalyst import metrics
metrics.recall(
    outputs=torch.tensor([
        [1, 0, 0],
        [0, 1, 0],
        [0, 0, 1],
    ]),
    targets=torch.tensor([0, 1, 2]),
)
# tensor([1., 1., 1.])
import torch
from catalyst import metrics
metrics.recall(
    outputs=torch.tensor([[0, 0, 1, 1, 0, 1, 0, 1]]),
    targets=torch.tensor([[0, 1, 0, 1, 0, 0, 1, 1]]),
)
# tensor([0.5000, 0.5000]

Segmentation

catalyst.metrics.functional._segmentation.dice(outputs: torch.Tensor, targets: torch.Tensor, class_dim: int = 1, threshold: Optional[float] = None, mode: str = 'per-class', weights: Optional[List[float]] = None, eps: float = 1e-07)torch.Tensor[source]

Computes the dice score, dice score = 2 * intersection / (intersection + union)) = = 2 * tp / (2 * tp + fp + fn)

Parameters
  • outputs – [N; K; …] tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.

  • targets – binary [N; K; …] tensor that encodes which of the K classes are associated with the N-th input

  • class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1), if mode = “micro” means nothing

  • threshold – threshold for outputs binarization

  • mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’, ‘per-class’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated per-class and than are averaged over all classes. If mode=’weighted’, metric are calculated per-class and than summed over all classes with weights. If mode=’per-class’, metric are calculated separately for all classes

  • weights – class weights(for mode=”weighted”)

  • eps – epsilon to avoid zero division

Returns

Dice score for each class(if mode=’weighted’) or aggregated Dice

Example:

import torch
from catalyst import metrics

size = 4
half_size = size // 2
shape = (1, 1, size, size)
empty = torch.zeros(shape)
full = torch.ones(shape)
left = torch.ones(shape)
left[:, :, :, half_size:] = 0
right = torch.ones(shape)
right[:, :, :, :half_size] = 0
top_left = torch.zeros(shape)
top_left[:, :, :half_size, :half_size] = 1
pred = torch.cat([empty, left, empty, full, left, top_left], dim=1)
targets = torch.cat([full, right, empty, full, left, left], dim=1)

metrics.dice(
    outputs=pred,
    targets=targets,
    class_dim=1,
    threshold=0.5,
    mode="per-class"
)
# tensor([0.0000, 0.0000, 1.0000, 1.0000, 1.0000, 0.6667])

metrics.dice(
    outputs=pred,
    targets=targets,
    class_dim=1,
    threshold=0.5,
    mode="macro"
)
# tensor(0.6111)

metrics.dice(
    outputs=pred,
    targets=targets,
    class_dim=1,
    threshold=0.5,
    mode="micro"
)
# tensor(0.6087)
catalyst.metrics.functional._segmentation.get_segmentation_statistics(outputs: torch.Tensor, targets: torch.Tensor, class_dim: int = 1, threshold: Optional[float] = None)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

Computes true positive, false positive, false negative for a multilabel segmentation problem.

Parameters
  • outputs – [N; K; …] tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.

  • targets – binary [N; K; …] tensor that encodes which of the K classes are associated with the N-th input

  • class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1)

  • threshold – threshold for outputs binarization

Returns

Segmentation stats

Example:

import torch
from catalyst import metrics

size = 4
half_size = size // 2
shape = (1, 1, size, size)
empty = torch.zeros(shape)
full = torch.ones(shape)
left = torch.ones(shape)
left[:, :, :, half_size:] = 0
right = torch.ones(shape)
right[:, :, :, :half_size] = 0
top_left = torch.zeros(shape)
top_left[:, :, :half_size, :half_size] = 1
pred = torch.cat([empty, left, empty, full, left, top_left], dim=1)
targets = torch.cat([full, right, empty, full, left, left], dim=1)

metrics.get_segmentation_statistics(
    outputs=pred,
    targets=targets,
    class_dim=1,
    threshold=0.5,
)
# (
#     tensor([ 0.,  0.,  0., 16.,  8.,  4.]),  # per class TP
#     tensor([0., 8., 0., 0., 0., 0.]),        # per class FP
#     tensor([16.,  8.,  0.,  0.,  0.,  4.]),  # per class TN
# )
catalyst.metrics.functional._segmentation.iou(outputs: torch.Tensor, targets: torch.Tensor, class_dim: int = 1, threshold: Optional[float] = None, mode: str = 'per-class', weights: Optional[List[float]] = None, eps: float = 1e-07)torch.Tensor[source]

Computes the iou/jaccard score, iou score = intersection / union = tp / (tp + fp + fn)

Parameters
  • outputs – [N; K; …] tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.

  • targets – binary [N; K; …] tensor that encodes which of the K classes are associated with the N-th input

  • class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1), if mode = “micro” means nothing

  • threshold – threshold for outputs binarization

  • mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’, ‘per-class’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated per-class and than are averaged over all classes. If mode=’weighted’, metric are calculated per-class and than summed over all classes with weights. If mode=’per-class’, metric are calculated separately for all classes

  • weights – class weights(for mode=”weighted”)

  • eps – epsilon to avoid zero division

Returns

IoU (Jaccard) score for each class(if mode=’weighted’) or aggregated IOU

Example:

import torch
from catalyst import metrics

size = 4
half_size = size // 2
shape = (1, 1, size, size)
empty = torch.zeros(shape)
full = torch.ones(shape)
left = torch.ones(shape)
left[:, :, :, half_size:] = 0
right = torch.ones(shape)
right[:, :, :, :half_size] = 0
top_left = torch.zeros(shape)
top_left[:, :, :half_size, :half_size] = 1
pred = torch.cat([empty, left, empty, full, left, top_left], dim=1)
targets = torch.cat([full, right, empty, full, left, left], dim=1)

metrics.iou(
    outputs=pred,
    targets=targets,
    class_dim=1,
    threshold=0.5,
    mode="per-class"
)
# tensor([0.0000, 0.0000, 1.0000, 1.0000, 1.0000, 0.5])

metrics.iou(
    outputs=pred,
    targets=targets,
    class_dim=1,
    threshold=0.5,
    mode="macro"
)
# tensor(0.5833)

metrics.iou(
    outputs=pred,
    targets=targets,
    class_dim=1,
    threshold=0.5,
    mode="micro"
)
# tensor(0.4375)
catalyst.metrics.functional._segmentation.trevsky(outputs: torch.Tensor, targets: torch.Tensor, alpha: float, beta: Optional[float] = None, class_dim: int = 1, threshold: Optional[float] = None, mode: str = 'per-class', weights: Optional[List[float]] = None, eps: float = 1e-07)torch.Tensor[source]

Computes the trevsky score, trevsky score = tp / (tp + fp * beta + fn * alpha)

Parameters
  • outputs – [N; K; …] tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.

  • targets – binary [N; K; …] tensor that encodes which of the K classes are associated with the N-th input

  • alpha – false negative coefficient, bigger alpha bigger penalty for false negative. Must be in (0, 1)

  • beta – false positive coefficient, bigger alpha bigger penalty for false positive. Must be in (0, 1), if None beta = (1 - alpha)

  • class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1)

  • threshold – threshold for outputs binarization

  • mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’, ‘per-class’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated per-class and than are averaged over all classes. If mode=’weighted’, metric are calculated per-class and than summed over all classes with weights. If mode=’per-class’, metric are calculated separately for all classes

  • weights – class weights(for mode=”weighted”)

  • eps – epsilon to avoid zero division

Returns

Trevsky score for each class(if mode=’weighted’) or aggregated score

Example:

import torch
from catalyst import metrics

size = 4
half_size = size // 2
shape = (1, 1, size, size)
empty = torch.zeros(shape)
full = torch.ones(shape)
left = torch.ones(shape)
left[:, :, :, half_size:] = 0
right = torch.ones(shape)
right[:, :, :, :half_size] = 0
top_left = torch.zeros(shape)
top_left[:, :, :half_size, :half_size] = 1
pred = torch.cat([empty, left, empty, full, left, top_left], dim=1)
targets = torch.cat([full, right, empty, full, left, left], dim=1)

metrics.trevsky(
    outputs=pred,
    targets=targets,
    alpha=0.2,
    class_dim=1,
    threshold=0.5,
    mode="per-class"
)
# tensor([0.0000, 0.0000, 1.0000, 1.0000, 1.0000, 0.8333])

metrics.trevsky(
    outputs=pred,
    targets=targets,
    alpha=0.2,
    class_dim=1,
    threshold=0.5,
    mode="macro"
)
# tensor(0.6389)

metrics.trevsky(
    outputs=pred,
    targets=targets,
    alpha=0.2,
    class_dim=1,
    threshold=0.5,
    mode="micro"
)
# tensor(0.7000)

Misc

catalyst.metrics.functional._misc.check_consistent_length(*tensors)[source]

Check that all arrays have consistent first dimensions. Checks whether all objects in arrays have the same shape or length.

Parameters

tensors – list or tensors of input objects. Objects that will be checked for consistent length.

Raises

ValueError – “Inconsistent numbers of samples”

catalyst.metrics.functional._misc.get_binary_statistics(outputs: torch.Tensor, targets: torch.Tensor, label: int = 1)Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]

Computes the number of true negative, false positive, false negative, true positive and support for a binary classification problem for a given label.

Parameters
  • outputs – estimated targets as predicted by a model with shape [bs; …, 1]

  • targets – ground truth (correct) target values with shape [bs; …, 1]

  • label – integer, that specifies label of interest for statistics compute

Returns

stats

Return type

Tuple[Tensor, Tensor, Tensor, Tensor, Tensor]

Example:

import torch
from catalyst import metrics
y_pred = torch.tensor([[0, 0, 1, 1, 0, 1, 0, 1]])
y_true = torch.tensor([[0, 1, 0, 1, 0, 0, 1, 1]])
tn, fp, fn, tp, support = metrics.get_binary_statistics(y_pred, y_true)
# tensor(2) tensor(2) tensor(2) tensor(2) tensor(4)
catalyst.metrics.functional._misc.get_default_topk_args(num_classes: int)Sequence[int][source]

Calculate list params for Accuracy@k and mAP@k.

Parameters

num_classes – number of classes

Returns

array of accuracy arguments

Return type

iterable

Examples

>>> get_default_topk_args(num_classes=4)
[1, 3]
>>> get_default_topk_args(num_classes=8)
[1, 3, 5]
catalyst.metrics.functional._misc.get_multiclass_statistics(outputs: torch.Tensor, targets: torch.Tensor, argmax_dim: int = - 1, num_classes: Optional[int] = None)Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]

Computes the number of true negative, false positive, false negative, true positive and support for a multiclass classification problem.

Parameters
  • outputs – estimated targets as predicted by a model with shape [bs; …, (num_classes or 1)]

  • targets – ground truth (correct) target values with shape [bs; …, 1]

  • argmax_dim – int, that specifies dimension for argmax transformation in case of scores/probabilities in outputs

  • num_classes – int, that specifies number of classes if it known

Returns

stats

Return type

Tuple[Tensor, Tensor, Tensor, Tensor, Tensor]

Example:

import torch
from catalyst import metrics
y_pred = torch.tensor([1, 2, 3, 0])
y_true = torch.tensor([1, 3, 4, 0])
tn, fp, fn, tp, support = metrics.get_multiclass_statistics(y_pred, y_true)
# (
#     tensor([3., 3., 3., 2., 3.]),
#     tensor([0., 0., 1., 1., 0.]),
#     tensor([0., 0., 0., 1., 1.]),
#     tensor([1., 1., 0., 0., 0.]),
#     tensor([1., 1., 0., 1., 1.])
# )
catalyst.metrics.functional._misc.get_multilabel_statistics(outputs: torch.Tensor, targets: torch.Tensor)Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]

Computes the number of true negative, false positive, false negative, true positive and support for a multilabel classification problem.

Parameters
  • outputs – estimated targets as predicted by a model with shape [bs; …, (num_classes or 1)]

  • targets – ground truth (correct) target values with shape [bs; …, 1]

Returns

stats

Return type

Tuple[Tensor, Tensor, Tensor, Tensor, Tensor]

Examples:

import torch
from catalyst import metrics
y_pred = torch.tensor([[0, 0, 1, 1], [0, 1, 0, 1]])
y_true = torch.tensor([[0, 1, 0, 1], [0, 0, 1, 1]])
tn, fp, fn, tp, support = metrics.get_multilabel_statistics(y_pred, y_true)
# (
#     tensor([2., 0., 0., 0.]),
#     tensor([0., 1., 1., 0.]),
#     tensor([0., 1., 1., 0.]),
#     tensor([0., 0., 0., 2.]),
#     tensor([0., 1., 1., 2.]),
# )
import torch
from catalyst import metrics
y_pred = torch.tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
y_true = torch.tensor([0, 1, 2])
tn, fp, fn, tp, support = metrics.get_multilabel_statistics(y_pred, y_true)
# (
#     tensor([2., 2., 2.]),
#     tensor([0., 0., 0.]),
#     tensor([0., 0., 0.]),
#     tensor([1., 1., 1.]),
#     tensor([1., 1., 1.]),
# )
import torch
from catalyst import metrics
y_pred = torch.tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
y_true = torch.nn.functional.one_hot(torch.tensor([0, 1, 2]))
tn, fp, fn, tp, support = metrics.get_multilabel_statistics(y_pred, y_true)
# (
#     tensor([2., 2., 2.]),
#     tensor([0., 0., 0.]),
#     tensor([0., 0., 0.]),
#     tensor([1., 1., 1.]),
#     tensor([1., 1., 1.]),
# )
catalyst.metrics.functional._misc.process_multilabel_components(outputs: torch.Tensor, targets: torch.Tensor, weights: Optional[torch.Tensor] = None)Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]

General preprocessing for multilabel-based metrics.

Parameters
  • outputs – NxK tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.

  • targets – binary NxK tensor that encodes which of the K classes are associated with the N-th input (eg: a row [0, 1, 0, 1] indicates that the example is associated with classes 2 and 4)

  • weights – importance for each sample

Returns

processed outputs and targets with [batch_size; num_classes] shape

catalyst.metrics.functional._misc.process_recsys_components(outputs: torch.Tensor, targets: torch.Tensor)torch.Tensor[source]

General pre-processing for calculation recsys metrics

Parameters
  • outputs (torch.Tensor) – Tensor with predicted scores size: [batch_size, slate_length] model outputs, logits

  • targets (torch.Tensor) – Binary tensor with ground truth. 1 means the item is relevant for the user and 0 not relevant size: [batch_szie, slate_length] ground truth, labels

Returns

targets tensor sorted by outputs

Return type

targets_sorted_by_outputs (torch.Tensor)