Metrics¶

Metric API ¶

IMetric ¶

class catalyst.metrics._metric.IMetric(compute_on_call: bool = True)[source]¶

Bases: abc.ABC

Interface for all Metrics.

Parameters: compute_on_call – Computes and returns metric value during metric call. Used for per-batch logging. default: True

abstract compute() → Any[source]¶

Computes the metric based on it’s accumulated state.

By default, this is called at the end of each loader (on_loader_end event).

Returns: computed value, # noqa: DAR202 it’s better to return key-value
Return type: Any

abstract reset() → None[source]¶

Resets the metric to it’s initial state.

By default, this is called at the start of each loader (on_loader_start event).

abstract update(*args, **kwargs) → Any[source]¶

Updates the metrics state using the passed data.

By default, this is called at the end of each batch (on_batch_end event).

Parameters

*args – some args :)
**kwargs – some kwargs ;)

ICallbackBatchMetric ¶

class catalyst.metrics._metric.ICallbackBatchMetric(compute_on_call: bool = True, prefix: str = None, suffix: str = None)[source]¶

Bases: catalyst.metrics._metric.IMetric

@TODO: docs here

abstract compute_key_value() → Dict[str, float][source]¶

Computes the metric based on it’s accumulated state.

By default, this is called at the end of each loader (on_loader_end event).

Returns: computed value in key-value format. # noqa: DAR202
Return type: Dict

abstract update_key_value(*args, **kwargs) → Dict[str, float][source]¶: @TODO: docs here

ICallbackLoaderMetric ¶

class catalyst.metrics._metric.ICallbackLoaderMetric(compute_on_call: bool = True, prefix: str = None, suffix: str = None)[source]¶

Bases: catalyst.metrics._metric.IMetric

Interface for all Metrics.

Parameters

compute_on_call – @TODO: docs
prefix – @TODO: docs
suffix – @TODO: docs

class catalyst.metrics._metric.AccumulationMetric(accumulative_fields: Iterable[str] = None, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]¶

Bases: catalyst.metrics._metric.ICallbackLoaderMetric

This metric accumulates all the input data along loader

Parameters

accumulative_fields – list of keys to accumulate data from batch
compute_on_call – if True, allows compute metric’s value on call
prefix – metric prefix
suffix – metric suffix

General Metrics ¶

AdditiveValueMetric ¶

class catalyst.metrics._additive.AdditiveValueMetric(compute_on_call: bool = True)[source]¶

Bases: catalyst.metrics._metric.IMetric

This metric computes mean and std values of input data.

Parameters: compute_on_call – if True, computes and returns metric value during metric call

ConfusionMatrixMetric ¶

class catalyst.metrics._confusion_matrix.ConfusionMatrixMetric(num_classes: int, normalized: bool = False, compute_on_call: bool = True)[source]¶

Bases: catalyst.metrics._metric.IMetric

Constructs a confusion matrix for a multiclass classification problems.

Parameters

num_classes – number of classes in the classification problem
normalized – determines whether or not the confusion matrix is normalized or not
compute_on_call – Boolean flag to computes and return confusion matrix during __call__. default: True

BatchFunctionalMetric ¶

class catalyst.metrics._functional_metric.BatchFunctionalMetric(metric_fn: Callable, metric_name: str)[source]¶

Bases: catalyst.metrics._metric.ICallbackBatchMetric

Class for custom metric in functional way. Note: the loader metrics calculated as average over all batch metrics

Parameters

metric_fn – metric function, that get outputs, targets and return score as torch.Tensor
metric_name – metric name

Runner Metrics ¶

Accuracy - AccuracyMetric ¶

class catalyst.metrics._accuracy.AccuracyMetric(topk_args: List[int] = None, num_classes: int = None, compute_on_call: bool = True, prefix: str = None, suffix: str = None)[source]¶

Bases: catalyst.metrics._metric.ICallbackBatchMetric

This metric computes accuracy for multiclass classification case. It computes mean value of accuracy and it’s approximate std value (note that it’s not a real accuracy std but std of accuracy over batch mean values).

Parameters

topk_args – list of topk for accuracy@topk computing
num_classes – number of classes
compute_on_call – if True, computes and returns metric value during metric call
prefix – metric prefix
suffix – metric suffix

Accuracy - MultilabelAccuracyMetric ¶

class catalyst.metrics._accuracy.MultilabelAccuracyMetric(threshold: Union[float, torch.Tensor] = 0.5, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]¶

Bases: catalyst.metrics._additive.AdditiveValueMetric, catalyst.metrics._metric.ICallbackBatchMetric

This metric computes accuracy for multilabel classification case. It computes mean value of accuracy and it’s approximate std value (note that it’s not a real accuracy std but std of accuracy over batch mean values).

Parameters

compute_on_call – if True, computes and returns metric value during metric call
prefix – metric prefix
suffix – metric suffix
threshold – thresholds for model scores

AUCMetric ¶

class catalyst.metrics._auc.AUCMetric(compute_on_call: bool = True, prefix: str = None, suffix: str = None)[source]¶

Bases: catalyst.metrics._metric.ICallbackLoaderMetric

AUC metric,

Parameters

compute_on_call – if True, computes and returns metric value during metric call
prefix – metric prefix
suffix – metric suffix

Classification – BinaryPrecisionRecallF1Metric ¶

class catalyst.metrics._classification.BinaryPrecisionRecallF1Metric(zero_division: int = 0, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]¶

Bases: catalyst.metrics._classification.StatisticsMetric

Precision, recall, f1_score and support metrics for binary classification.

Parameters

zero_division – value to set in case of zero division during metrics (precision, recall) computation; should be one of 0 or 1
compute_on_call – if True, allows compute metric’s value on call
prefix – metric prefix
suffix – metric suffix

Classification – MulticlassPrecisionRecallF1SupportMetric ¶

class catalyst.metrics._classification.MulticlassPrecisionRecallF1SupportMetric(num_classes: int = None, zero_division: int = 0, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]¶

Bases: catalyst.metrics._classification.PrecisionRecallF1SupportMetric

Precision, recall, f1_score and support metrics for multiclass classification. Counts metrics with macro, micro and weighted average.

Parameters

num_classes – number of classes in loader’s dataset
zero_division – value to set in case of zero division during metrics (precision, recall) computation; should be one of 0 or 1
compute_on_call – if True, allows compute metric’s value on call
prefix – metric prefix
suffix – metric suffix

Classification – MultilabelPrecisionRecallF1SupportMetric ¶

class catalyst.metrics._classification.MultilabelPrecisionRecallF1SupportMetric(num_classes: int = None, zero_division: int = 0, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]¶

Bases: catalyst.metrics._classification.PrecisionRecallF1SupportMetric

Precision, recall, f1_score and support metrics for multilabel classification. Counts metrics with macro, micro and weighted average.

Parameters

num_classes – number of classes in loader’s dataset
zero_division – value to set in case of zero division during metrics (precision, recall) computation; should be one of 0 or 1
compute_on_call – if True, allows compute metric’s value on call
prefix – metric prefix
suffix – metric suffix

CMCMetric ¶

class catalyst.metrics._cmc_score.CMCMetric(embeddings_key: str, labels_key: str, is_query_key: str, topk_args: Iterable[int] = None, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]¶

Bases: catalyst.metrics._metric.AccumulationMetric

Cumulative Matching Characteristics

Parameters

embeddings_key – key of embedding tensor in batch
labels_key – key of label tensor in batch
is_query_key – key of query flag tensor in batch
topk_args – list of k, specifies which cmc@k should be calculated
compute_on_call – if True, allows compute metric’s value on call
prefix – metric prefix
suffix – metric suffix

Examples

>>> from collections import OrderedDict
>>> from torch.optim import Adam
>>> from torch.utils.data import DataLoader
>>> from catalyst.contrib import nn
>>> from catalyst.contrib.datasets import MnistMLDataset, MnistQGDataset
>>> from catalyst.data import BalanceBatchSampler, HardTripletsSampler
>>> from catalyst.dl import ControlFlowCallback, LoaderMetricCallback, SupervisedRunner
>>> from catalyst.metrics import CMCMetric
>>>
>>> dataset_root = "."
>>>
>>> # download dataset for train and val, create loaders
>>> dataset_train = MnistMLDataset(root=dataset_root, download=True, transform=None)
>>> sampler = BalanceBatchSampler(labels=dataset_train.get_labels(), p=5, k=10)
>>> train_loader = DataLoader(
>>>     dataset=dataset_train, sampler=sampler, batch_size=sampler.batch_size
>>> )
>>> dataset_valid = MnistQGDataset(root=dataset_root, transform=None, gallery_fraq=0.2)
>>> valid_loader = DataLoader(dataset=dataset_valid, batch_size=1024)
>>>
>>> # model, optimizer, criterion
>>> model = nn.Sequential(nn.Flatten(), nn.Linear(28 * 28, 100))
>>> optimizer = Adam(model.parameters())
>>> sampler_inbatch = HardTripletsSampler(norm_required=False)
>>> criterion = nn.TripletMarginLossWithSampler(
>>>     margin=0.5, sampler_inbatch=sampler_inbatch
>>> )
>>>
>>> # batch data processing
>>> class CustomRunner(SupervisedRunner):
>>>     def handle_batch(self, batch):
>>>         if self.is_train_loader:
>>>             images, targets = batch["features"].float(), batch["targets"].long()
>>>             features = model(images)
>>>             self.batch = {
>>>                 "embeddings": features,
>>>                 "targets": targets,
>>>             }
>>>         else:
>>>             images, targets, is_query = (
>>>                 batch["features"].float(),
>>>                 batch["targets"].long(),
>>>                 batch["is_query"].bool(),
>>>             )
>>>             features = model(images)
>>>             self.batch = {
>>>                 "embeddings": features,
>>>                 "targets": targets,
>>>                 "is_query": is_query,
>>>             }
>>>
>>> # training
>>> runner = CustomRunner(input_key="features", output_key="embeddings")
>>> runner.train(
>>>     model=model,
>>>     criterion=criterion,
>>>     optimizer=optimizer,
>>>     callbacks=OrderedDict(
>>>         {
>>>             "cmc": ControlFlowCallback(
>>>                 LoaderMetricCallback(
>>>                     CMCMetric(
>>>                         embeddings_key="embeddings",
>>>                         labels_key="targets",
>>>                         is_query_key="is_query",
>>>                         topk_args=(1, 3)
>>>                     ),
>>>                     input_key=["embeddings", "is_query"],
>>>                     target_key=["targets"]
>>>                 ),
>>>                 loaders="valid",
>>>             ),
>>>         }
>>>     ),
>>>     loaders=OrderedDict({"train": train_loader, "valid": valid_loader}),
>>>     valid_loader="valid",
>>>     valid_metric="cmc01",
>>>     minimize_valid_metric=False,
>>>     logdir="./logs",
>>>     verbose=True,
>>>     num_epochs=3,
>>> )

RecSys – HitrateMetric ¶

class catalyst.metrics._hitrate.HitrateMetric(topk_args: List[int] = None, compute_on_call: bool = True, prefix: str = None, suffix: str = None)[source]¶

Bases: catalyst.metrics._metric.ICallbackBatchMetric

Calculates the hitrate.

Parameters

topk_args – list of topk for hitrate@topk computing
compute_on_call – if True, computes and returns metric value during metric call
prefix – metric prefix
suffix – metric suffix

Compute mean value of hitrate and it’s approximate std value.

RecSys – MAPMetric ¶

class catalyst.metrics._map.MAPMetric(topk_args: List[int] = None, compute_on_call: bool = True, prefix: str = None, suffix: str = None)[source]¶

Bases: catalyst.metrics._metric.ICallbackBatchMetric

Calculates the Mean Average Precision (MAP) for RecSys. The precision metric summarizes the fraction of relevant items out of the whole the recommendation list.

Parameters

topk_args – list of topk for map@topk computing
compute_on_call – if True, computes and returns metric value during metric call
prefix – metric prefix
suffix – metric suffix

It computes mean value of map and it’s approximate std value

RecSys – MRRMetric ¶

class catalyst.metrics._mrr.MRRMetric(topk_args: List[int] = None, compute_on_call: bool = True, prefix: str = None, suffix: str = None)[source]¶

Bases: catalyst.metrics._metric.ICallbackBatchMetric

Calculate the Mean Reciprocal Rank (MRR) score given model outputs and targets The precision metric summarizes the fraction of relevant items

Parameters

topk_args – list of topk for mrr@topk computing
compute_on_call – if True, computes and returns metric value during metric call
prefix – metric prefix
suffix – metric suffix

Compute mean value of map and it’s approximate std value

RecSys – NDCGMetric ¶

class catalyst.metrics._ndcg.NDCGMetric(topk_args: List[int] = None, compute_on_call: bool = True, prefix: str = None, suffix: str = None)[source]¶

Bases: catalyst.metrics._metric.ICallbackBatchMetric

Calculate the Normalized discounted cumulative gain (NDCG) score given model outputs and targets The precision metric summarizes the fraction of relevant items

Parameters

topk_args – list of topk for ndcg@topk computing
compute_on_call – if True, computes and returns metric value during metric call
prefix – metric prefix
suffix – metric suffix

Compute mean value of ndcg and it’s approximate std value

Segmentation – RegionBasedMetric ¶

class catalyst.metrics._segmentation.RegionBasedMetric(metric_fn: Callable, metric_name: str, class_dim: int = 1, weights: Optional[List[float]] = None, class_names: Optional[List[str]] = None, threshold: Optional[float] = 0.5, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]¶

Bases: catalyst.metrics._metric.ICallbackBatchMetric

Logic class for all region based metrics, like IoU, Dice, Trevsky.

Parameters

metric_fn – metric function, that get statistics and return score
metric_name – name of the metric
class_dim – indicates class dimension (K) for outputs and targets tensors (default = 1)
weights – class weights
class_names – class names
threshold – threshold for outputs binarization
compute_on_call – Computes and returns metric value during metric call. Used for per-batch logging. default: True
prefix – metric prefix
suffix – metric suffix

Segmentation – DiceMetric ¶

class catalyst.metrics._segmentation.DiceMetric(class_dim: int = 1, weights: Optional[List[float]] = None, class_names: Optional[List[str]] = None, threshold: Optional[float] = None, eps: float = 1e-07, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]¶

Bases: catalyst.metrics._segmentation.RegionBasedMetric

Dice Metric, dice score = 2 * intersection / (intersection + union)) = 2 * tp / (2 * tp + fp + fn)

Parameters

class_dim – indicates class dimention (K) for outputs and
tensors (targets) –
weights – class weights
class_names – class names
threshold – threshold for outputs binarization
eps – epsilon to avoid zero division
compute_on_call – Computes and returns metric value during metric call. Used for per-batch logging. default: True
prefix – metric prefix
suffix – metric suffix

Segmentation – IOUMetric ¶

class catalyst.metrics._segmentation.IOUMetric(class_dim: int = 1, weights: Optional[List[float]] = None, class_names: Optional[List[str]] = None, threshold: Optional[float] = None, eps: float = 1e-07, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]¶

Bases: catalyst.metrics._segmentation.RegionBasedMetric

IoU Metric, iou score = intersection / union = tp / (tp + fp + fn).

Parameters

class_dim – indicates class dimension (K) for outputs and targets tensors (default = 1)
weights – class weights
class_names – class names
threshold – threshold for outputs binarization
eps – epsilon to avoid zero division
compute_on_call – Computes and returns metric value during metric call. Used for per-batch logging. default: True
prefix – metric prefix
suffix – metric suffix

Segmentation – TrevskyMetric ¶

class catalyst.metrics._segmentation.TrevskyMetric(alpha: float, beta: Optional[float] = None, class_dim: int = 1, weights: Optional[List[float]] = None, class_names: Optional[List[str]] = None, threshold: Optional[float] = None, eps: float = 1e-07, compute_on_call: bool = True, prefix: Optional[str] = None, suffix: Optional[str] = None)[source]¶

Bases: catalyst.metrics._segmentation.RegionBasedMetric

Trevsky Metric, trevsky score = tp / (tp + fp * beta + fn * alpha)

Parameters

alpha – false negative coefficient, bigger alpha bigger penalty for false negative. if beta is None, alpha must be in (0, 1)
beta – false positive coefficient, bigger alpha bigger penalty for false positive. Must be in (0, 1), if None beta = (1 - alpha)
class_dim – indicates class dimension (K) for outputs and targets tensors (default = 1)
weights – class weights
class_names – class names
threshold – threshold for outputs binarization
eps – epsilon to avoid zero division
compute_on_call – Computes and returns metric value during metric call. Used for per-batch logging. default: True
prefix – metric prefix
suffix – metric suffix

Functional API ¶

Accuracy ¶

catalyst.metrics.functional._accuracy.accuracy(outputs: torch.Tensor, targets: torch.Tensor, topk: Sequence[int] = (1, )) → Sequence[torch.Tensor][source]¶

Computes multiclass accuracy@topk for the specified values of topk.

Parameters

outputs – model outputs, logits with shape [bs; num_classes]
targets – ground truth, labels with shape [bs; 1]
topk – topk for accuracy@topk computing

Returns

list with computed accuracy@topk

Example

>>> accuracy(
>>>     outputs=torch.tensor([
>>>         [1, 0, 0],
>>>         [0, 1, 0],
>>>         [0, 0, 1],
>>>     ]),
>>>     targets=torch.tensor([0, 1, 2]),
>>> )
[tensor([1.])]
>>> accuracy(
>>>     outputs=torch.tensor([
>>>         [1, 0, 0],
>>>         [0, 1, 0],
>>>         [0, 1, 0],
>>>     ]),
>>>     targets=torch.tensor([0, 1, 2]),
>>> )
[tensor([0.6667])]
>>> accuracy(
>>>     outputs=torch.tensor([
>>>         [1, 0, 0],
>>>         [0, 1, 0],
>>>         [0, 0, 1],
>>>     ]),
>>>     targets=torch.tensor([0, 1, 2]),
>>>     topk=[1, 3],
>>> )
[tensor([1.]), tensor([1.])]
>>> accuracy(
>>>     outputs=torch.tensor([
>>>         [1, 0, 0],
>>>         [0, 1, 0],
>>>         [0, 1, 0],
>>>     ]),
>>>     targets=torch.tensor([0, 1, 2]),
>>>     topk=[1, 3],
>>> )
[tensor([0.6667]), tensor([1.])]

catalyst.metrics.functional._accuracy.multilabel_accuracy(outputs: torch.Tensor, targets: torch.Tensor, threshold: Union[float, torch.Tensor]) → torch.Tensor[source]¶

Computes multilabel accuracy for the specified activation and threshold.

Parameters

outputs – NxK tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.
targets – binary NxK tensort that encodes which of the K classes are associated with the N-th input (eg: a row [0, 1, 0, 1] indicates that the example is associated with classes 2 and 4)
threshold – threshold for for model output

Returns

computed multilabel accuracy

Example

>>> multilabel_accuracy(
>>>     outputs=torch.tensor([
>>>         [1, 0],
>>>         [0, 1],
>>>     ]),
>>>     targets=torch.tensor([
>>>         [1, 0],
>>>         [0, 1],
>>>     ]),
>>>     threshold=0.5,
>>> )
tensor([1.])
>>> multilabel_accuracy(
>>>     outputs=torch.tensor([
>>>         [1.0, 0.0],
>>>         [0.6, 1.0],
>>>     ]),
>>>     targets=torch.tensor([
>>>         [1, 0],
>>>         [0, 1],
>>>     ]),
>>>     threshold=0.5,
>>> )
tensor(0.7500)
>>> multilabel_accuracy(
>>>     outputs=torch.tensor([
>>>         [1.0, 0.0],
>>>         [0.4, 1.0],
>>>     ]),
>>>     targets=torch.tensor([
>>>         [1, 0],
>>>         [0, 1],
>>>     ]),
>>>     threshold=0.5,
>>> )
tensor(1.0)

AUC ¶

catalyst.metrics.functional._auc.auc(outputs: torch.Tensor, targets: torch.Tensor) → torch.Tensor[source]¶

Computes ROC-AUC.

Parameters

outputs – NxK tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.
targets – binary NxK tensort that encodes which of the K classes are associated with the N-th input (eg: a row [0, 1, 0, 1] indicates that the example is associated with classes 2 and 4)

Returns

Tensor with [num_classes] shape of per-class-aucs

Return type

torch.Tensor

Example

>>> auc(
>>>     outputs=torch.tensor([
>>>         [0.9, 0.1],
>>>         [0.1, 0.9],
>>>     ]),
>>>     targets=torch.tensor([
>>>         [1, 0],
>>>         [0, 1],
>>>     ]),
>>> )
tensor([1., 1.])
>>> auc(
>>>     outputs=torch.tensor([
>>>         [0.9],
>>>         [0.8],
>>>         [0.7],
>>>         [0.6],
>>>         [0.5],
>>>         [0.4],
>>>         [0.3],
>>>         [0.2],
>>>         [0.1],
>>>         [0.0],
>>>     ]),
>>>     targets=torch.tensor([
>>>         [0],
>>>         [1],
>>>         [1],
>>>         [1],
>>>         [1],
>>>         [1],
>>>         [1],
>>>         [0],
>>>         [0],
>>>         [0],
>>>     ]),
>>> )
tensor([0.7500])

Average Precision ¶

catalyst.metrics.functional._average_precision.binary_average_precision(outputs: torch.Tensor, targets: torch.Tensor, weights: Optional[torch.Tensor] = None) → torch.Tensor[source]¶

Computes the average precision.

Parameters

outputs – NxK tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.
targets – binary NxK tensort that encodes which of the K classes are associated with the N-th input (eg: a row [0, 1, 0, 1] indicates that the example is associated with classes 2 and 4)
weights – importance for each sample

Returns

tensor of [K; ] shape, with average precision for K classes

Return type

torch.Tensor

Examples

>>> binary_average_precision(
>>>     outputs=torch.Tensor([0.1, 0.4, 0.35, 0.8]),
>>>     targets=torch.Tensor([0, 0, 1, 1]),
>>> )
tensor([0.8333])

catalyst.metrics.functional._average_precision.mean_average_precision(outputs: torch.Tensor, targets: torch.Tensor, topk: List[int]) → List[torch.Tensor][source]¶

Calculate the mean average precision (MAP) for RecSys. The metrics calculate the mean of the AP across all batches

MAP amplifies the interest in finding many relevant items for each query

Parameters

outputs (torch.Tensor) – Tensor with predicted score size: [batch_size, slate_length] model outputs, logits
targets (torch.Tensor) – Binary tensor with ground truth. 1 means the item is relevant and 0 not relevant size: [batch_szie, slate_length] ground truth, labels
topk (List[int]) – List of parameter for evaluation topK items

Returns

The map score for every k. size: len(top_k)

Return type

map_at_k (Tuple[float])

Examples

>>> mean_average_precision(
>>>     outputs=torch.tensor([
>>>         [9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
>>>         [9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
>>>     ]),
>>>     targets=torch.tensor([
>>>         [1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0],
>>>         [0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0],
>>>     ]),
>>>     topk=[10],
>>> )
[tensor(0.5325)]

catalyst.metrics.functional._average_precision.average_precision(outputs: torch.Tensor, targets: torch.Tensor) → torch.Tensor[source]¶

Calculate the Average Precision for RecSys. The precision metric summarizes the fraction of relevant items out of the whole the recommendation list.

To compute the precision at k set the threshold rank k, compute the percentage of relevant items in topK, ignoring the documents ranked lower than k.

The average precision at k (AP at k) summarizes the average precision for relevant items up to the k-th one. Wikipedia entry for the Average precision

<https://en.wikipedia.org/w/index.php?title=Information_retrieval& oldid=793358396#Average_precision>

If a relevant document never gets retrieved, we assume the precision corresponding to that relevant doc to be zero

Parameters

outputs (torch.Tensor) – Tensor with predicted score size: [batch_size, slate_length] model outputs, logits
targets (torch.Tensor) – Binary tensor with ground truth. 1 means the item is relevant and 0 not relevant size: [batch_szie, slate_length] ground truth, labels

Returns

The map score for each batch. size: [batch_size, 1]

Return type

ap_score (torch.Tensor)

Examples

>>> average_precision(
>>>     outputs=torch.tensor([
>>>         [9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
>>>         [9, 8, 7, 6, 5, 4, 3, 2, 1, 0],
>>>     ]),
>>>     targets=torch.tensor([
>>>         [1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0],
>>>         [0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0],
>>>     ]),
>>> )
tensor([0.6222, 0.4429])

Classification ¶

catalyst.metrics.functional._classification.f1score(precision_value, recall_value, eps=1e-05)[source]¶

Calculating F1-score from precision and recall to reduce computation redundancy.

Parameters

precision_value – precision (0-1)
recall_value – recall (0-1)
eps – epsilon to use

Returns

F1 score (0-1)

catalyst.metrics.functional._classification.precision_recall_fbeta_support(outputs: torch.Tensor, targets: torch.Tensor, beta: float = 1, eps: float = 1e-06, argmax_dim: int = -1, num_classes: Optional[int] = None, zero_division: int = 0) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]¶

Counts precision_val, recall, fbeta_score.

Parameters

outputs – A list of predicted elements
targets – A list of elements that are to be predicted
beta – beta param for f_score
eps – epsilon to avoid zero division
argmax_dim – int, that specifies dimension for argmax transformation in case of scores/probabilities in outputs
num_classes – int, that specifies number of classes if it known.
zero_division – int value, should be one of 0 or 1; used for precision_val and recall computation

Returns

tuple of precision_val, recall, fbeta_score

Examples

>>> precision_recall_fbeta_support(
>>>     outputs=torch.tensor([
>>>         [1, 0, 0],
>>>         [0, 1, 0],
>>>         [0, 0, 1],
>>>     ]),
>>>     targets=torch.tensor([0, 1, 2]),
>>>     beta=1,
>>> )
(
    tensor([1., 1., 1.]),  # precision_val per class
    tensor([1., 1., 1.]),  # recall per class
    tensor([1., 1., 1.]),  # fbeta per class
    tensor([1., 1., 1.]),  # support per class
)
>>> precision_recall_fbeta_support(
>>>     outputs=torch.tensor([[0, 0, 1, 1, 0, 1, 0, 1]]),
>>>     targets=torch.tensor([[0, 1, 0, 1, 0, 0, 1, 1]]),
>>>     beta=1,
>>> )
(
    tensor([0.5000, 0.5000]),  # precision per class
    tensor([0.5000, 0.5000]),  # recall per class
    tensor([0.5000, 0.5000]),  # fbeta per class
    tensor([4., 4.]),          # support per class
)

catalyst.metrics.functional._classification.precision(tp: int, fp: int, zero_division: int = 0) → float[source]¶

Calculates precision (a.k.a. positive predictive value) for binary classification and segmentation.

Parameters

tp – number of true positives
fp – number of false positives
zero_division – int value, should be one of 0 or 1; if both tp==0 and fp==0 return this value as s result

Returns

precision value (0-1)

catalyst.metrics.functional._classification.recall(tp: int, fn: int, zero_division: int = 0) → float[source]¶

Calculates recall (a.k.a. true positive rate) for binary classification and segmentation.

Parameters

tp – number of true positives
fn – number of false negatives
zero_division – int value, should be one of 0 or 1; if both tp==0 and fn==0 return this value as s result

Returns

recall value (0-1)

catalyst.metrics.functional._classification.get_aggregated_metrics(tp: numpy.array, fp: numpy.array, fn: numpy.array, support: numpy.array, zero_division: int = 0) → Tuple[numpy.array, numpy.array, numpy.array, numpy.array][source]¶

Count precision, recall, f1 scores per-class and with macro, weighted and micro average with statistics.

Parameters

tp – array of shape (num_classes, ) of true positive statistics per class
fp – array of shape (num_classes, ) of false positive statistics per class
fn – array of shape (num_classes, ) of false negative statistics per class
support – array of shape (num_classes, ) of samples count per class
zero_division – int value, should be one of 0 or 1; used for precision and recall computation

Returns

per-class, micro, macro, weighted averaging

Return type

arrays of metrics

catalyst.metrics.functional._classification.get_binary_metrics(tp: int, fp: int, fn: int, zero_division: int) → Tuple[float, float, float][source]¶

Get precision, recall, f1 score metrics from true positive, false positive,: false negative statistics for binary classification

Parameters

tp – true positive
fp – false positive
fn – false negative
zero_division – int value, should be 0 or 1

Returns

precision, recall, f1 scores

CMC Score ¶

catalyst.metrics.functional._cmc_score.cmc_score_count(distances: torch.Tensor, conformity_matrix: torch.Tensor, topk: int = 1) → float[source]¶

Function to count CMC from distance matrix and conformity matrix.

Parameters

distances – distance matrix shape of (n_embeddings_x, n_embeddings_y)
conformity_matrix – binary matrix with 1 on same label pos and 0 otherwise
topk – number of top examples for cumulative score counting

Returns

cmc score

catalyst.metrics.functional._cmc_score.cmc_score(query_embeddings: torch.Tensor, gallery_embeddings: torch.Tensor, conformity_matrix: torch.Tensor, topk: int = 1) → float[source]¶

Function to count CMC score from query and gallery embeddings.

Parameters

query_embeddings – tensor shape of (n_embeddings, embedding_dim) embeddings of the objects in query
gallery_embeddings – tensor shape of (n_embeddings, embedding_dim) embeddings of the objects in gallery
conformity_matrix – binary matrix with 1 on same label pos and 0 otherwise
topk – number of top examples for cumulative score counting

Returns

cmc score

F1 score ¶

catalyst.metrics.functional._f1_score.f1_score(outputs: torch.Tensor, targets: torch.Tensor, eps: float = 1e-07, argmax_dim: int = -1, num_classes: Optional[int] = None) → Union[float, torch.Tensor][source]¶

Fbeta_score with beta=1.

Parameters

outputs – A list of predicted elements
targets – A list of elements that are to be predicted
eps – epsilon to avoid zero division
argmax_dim – int, that specifies dimension for argmax transformation in case of scores/probabilities in outputs
num_classes – int, that specifies number of classes if it known

Returns

F_1 score

Return type

float

catalyst.metrics.functional._f1_score.fbeta_score(outputs: torch.Tensor, targets: torch.Tensor, beta: float = 1.0, eps: float = 1e-07, argmax_dim: int = -1, num_classes: Optional[int] = None) → Union[float, torch.Tensor][source]¶

Counts fbeta score for given outputs and targets.

Parameters

outputs – A list of predicted elements
targets – A list of elements that are to be predicted
beta – beta param for f_score
eps – epsilon to avoid zero division
argmax_dim – int, that specifies dimension for argmax transformation in case of scores/probabilities in outputs
num_classes – int, that specifies number of classes if it known

Raises

ValueError – If beta is a negative number.

Returns

F_1 score.

Return type

float

Focal ¶

catalyst.metrics.functional._focal.sigmoid_focal_loss(outputs: torch.Tensor, targets: torch.Tensor, gamma: float = 2.0, alpha: float = 0.25, reduction: str = 'mean')[source]¶

Compute binary focal loss between target and output logits.

Parameters

outputs – tensor of arbitrary shape
targets – tensor of the same shape as input
gamma – gamma for focal loss
alpha – alpha for focal loss
reduction (string, optional) – specifies the reduction to apply to the output: "none" | "mean" | "sum" | "batchwise_mean". "none": no reduction will be applied, "mean": the sum of the output will be divided by the number of elements in the output, "sum": the output will be summed.

Returns

computed loss

Source: https://github.com/BloodAxe/pytorch-toolbelt

catalyst.metrics.functional._focal.reduced_focal_loss(outputs: torch.Tensor, targets: torch.Tensor, threshold: float = 0.5, gamma: float = 2.0, reduction='mean') → torch.Tensor[source]¶

Compute reduced focal loss between target and output logits.

It has been proposed in Reduced Focal Loss: 1st Place Solution to xView object detection in Satellite Imagery paper.

Note

size_average and reduce params are in the process of being deprecated, and in the meantime, specifying either of those two args will override reduction.

Source: https://github.com/BloodAxe/pytorch-toolbelt

Parameters

outputs – tensor of arbitrary shape
targets – tensor of the same shape as input
threshold – threshold for focal reduction
gamma – gamma for focal reduction
reduction – specifies the reduction to apply to the output: "none" | "mean" | "sum" | "batchwise_mean". "none": no reduction will be applied, "mean": the sum of the output will be divided by the number of elements in the output, "sum": the output will be summed. "batchwise_mean" computes mean loss per sample in batch. Default: “mean”

Returns: # noqa: DAR201: torch.Tensor: computed loss

Hitrate ¶

catalyst.metrics.functional._hitrate.hitrate(outputs: torch.Tensor, targets: torch.Tensor, topk: List[int]) → List[torch.Tensor][source]¶

Calculate the hit rate score given model outputs and targets. Hit-rate is a metric for evaluating ranking systems. Generate top-N recommendations and if one of the recommendation is actually what user has rated, you consider that a hit. By rate we mean any explicit form of user’s interactions. Add up all of the hits for all users and then divide by number of users

Compute top-N recomendation for each user in the training stage and intentionally remove one of this items fro the training data.

Parameters

outputs (torch.Tensor) – Tensor with predicted score size: [batch_size, slate_length] model outputs, logits
targets (torch.Tensor) – Binary tensor with ground truth. 1 means the item is relevant for the user and 0 not relevant size: [batch_size, slate_length] ground truth, labels
topk (List[int]) – Parameter fro evaluation on top-k items

Returns

the hit rate score

Return type

hitrate_at_k (List[torch.Tensor])

MRR ¶

catalyst.metrics.functional._mrr.reciprocal_rank(outputs: torch.Tensor, targets: torch.Tensor, k: int) → torch.Tensor[source]¶

Calculate the Reciprocal Rank (MRR) score given model outputs and targets Data aggregated in batches.

Parameters

outputs – Tensor weith predicted score size: [batch_size, slate_length] model outputs, logits
targets – Binary tensor with ground truth. 1 means the item is relevant and 0 if it’s not relevant size: [batch_size, slate_length] ground truth, labels
k – Parameter for evaluation on top-k items

Returns

MRR score

Examples

>>> reciprocal_rank(
>>>     outputs=torch.Tensor([
>>>         [4.0, 2.0, 3.0, 1.0],
>>>         [1.0, 2.0, 3.0, 4.0],
>>>     ]),
>>>     targets=torch.Tensor([
>>>         [0, 0, 1.0, 1.0],
>>>         [0, 0, 1.0, 1.0],
>>>     ]),
>>>     k=1,
>>> )
tensor([[0.], [1.]])
>>> reciprocal_rank(
>>>     outputs=torch.Tensor([
>>>         [4.0, 2.0, 3.0, 1.0],
>>>         [1.0, 2.0, 3.0, 4.0],
>>>     ]),
>>>     targets=torch.Tensor([
>>>         [0, 0, 1.0, 1.0],
>>>         [0, 0, 1.0, 1.0],
>>>     ]),
>>>     k=3,
>>> )
tensor([[0.5000], [1.0000]])

catalyst.metrics.functional._mrr.mrr(outputs: torch.Tensor, targets: torch.Tensor, topk: List[int]) → List[torch.Tensor][source]¶

Calculate the Mean Reciprocal Rank (MRR) score given model outputs and targets Data aggregated in batches.

The MRR@k is the mean overall batch of the reciprocal rank, that is the rank of the highest ranked relevant item, if any in the top k, 0 otherwise. https://en.wikipedia.org/wiki/Mean_reciprocal_rank

Parameters

outputs – Tensor weith predicted score size: [batch_size, slate_length] model outputs, logits
targets – Binary tensor with ground truth. 1 means the item is relevant and 0 if it’s not relevant size: [batch_szie, slate_length] ground truth, labels
topk – Parameter fro evaluation on top-k items

Returns

MRR score

Examples

>>> mrr(
>>>     outputs=torch.Tensor([
>>>         [4.0, 2.0, 3.0, 1.0],
>>>         [1.0, 2.0, 3.0, 4.0],
>>>     ]),
>>>     targets=torch.Tensor([
>>>         [0, 0, 1.0, 1.0],
>>>         [0, 0, 1.0, 1.0],
>>>     ]),
>>>     k=[1, 3],
>>> )
[tensor(0.5000), tensor(0.7500)]

NDCG ¶

catalyst.metrics.functional._ndcg.dcg(outputs: torch.Tensor, targets: torch.Tensor, gain_function='exp_rank') → torch.Tensor[source]¶

Computes Discounted cumulative gain (DCG) DCG@topk for the specified values of k. Graded relevance as a measure of usefulness, or gain, from examining a set of items. Gain may be reduced at lower ranks. Reference: https://en.wikipedia.org/wiki/Discounted_cumulative_gain

Parameters

outputs – model outputs, logits with shape [batch_size; slate_length]
targets – ground truth, labels with shape [batch_size; slate_length]
gain_function – String indicates the gain function for the ground truth labels. Two options available: - exp_rank: torch.pow(2, x) - 1 - linear_rank: x On the default, exp_rank is used to emphasize on retrieving the relevant documents.

Returns

The discounted gains tensor

Return type

dcg_score (torch.Tensor)

Raises

ValueError – gain function can be either pow_rank or rank

Examples

>>> dcg(
>>>     outputs = torch.tensor([
>>>         [3, 2, 1, 0],
>>>     ]),
>>>     targets = torch.Tensor([
>>>         [2.0, 2.0, 1.0, 0.0],
>>>     ]),
>>>     gain_function="linear_rank",
>>> )
tensor([[2.0000, 2.0000, 0.6309, 0.0000]])
>>> dcg(
>>>     outputs = torch.tensor([
>>>         [3, 2, 1, 0],
>>>     ]),
>>>     targets = torch.Tensor([
>>>         [2.0, 2.0, 1.0, 0.0],
>>>     ]),
>>>     gain_function="linear_rank",
>>> ).sum()
tensor(4.6309)
>>> dcg(
>>>     outputs = torch.tensor([
>>>         [3, 2, 1, 0],
>>>     ]),
>>>     targets = torch.Tensor([
>>>         [2.0, 2.0, 1.0, 0.0],
>>>     ]),
>>>     gain_function="exp_rank",
>>> )
tensor([[3.0000, 1.8928, 0.5000, 0.0000]])
>>> dcg(
>>>     outputs = torch.tensor([
>>>         [3, 2, 1, 0],
>>>     ]),
>>>     targets = torch.Tensor([
>>>         [2.0, 2.0, 1.0, 0.0],
>>>     ]),
>>>     gain_function="exp_rank",
>>> ).sum()
tensor(5.3928)

catalyst.metrics.functional._ndcg.ndcg(outputs: torch.Tensor, targets: torch.Tensor, topk: List[int], gain_function='exp_rank') → List[torch.Tensor][source]¶

Computes nDCG@topk for the specified values of topk.

Parameters

outputs (torch.Tensor) – model outputs, logits with shape [batch_size; slate_size]
targets (torch.Tensor) – ground truth, labels with shape [batch_size; slate_size]
gain_function – callable, gain function for the ground truth labels. Two options available: - exp_rank: torch.pow(2, x) - 1 - linear_rank: x On the default, exp_rank is used to emphasize on retrieving the relevant documents.
topk (List[int]) – Parameter fro evaluation on top-k items

Returns

tuple with computed ndcg@topk

Return type

results (Tuple[float])

Examples

>>> ndcg(
>>>     outputs = torch.tensor([
>>>         [0.5, 0.2, 0.1],
>>>         [0.5, 0.2, 0.1],
>>>     ]),
>>>     targets = torch.Tensor([
>>>         [1.0, 0.0, 1.0],
>>>         [1.0, 0.0, 1.0],
>>>     ]),
>>>     topk=[2],
>>>     gain_function="exp_rank",
>>> )
[tensor(0.6131)]
>>> ndcg(
>>>     outputs = torch.tensor([
>>>         [0.5, 0.2, 0.1],
>>>         [0.5, 0.2, 0.1],
>>>     ]),
>>>     targets = torch.Tensor([
>>>         [1.0, 0.0, 1.0],
>>>         [1.0, 0.0, 1.0],
>>>     ]),
>>>     topk=[2],
>>>     gain_function="exp_rank",
>>> )
[tensor(0.5000)]

Precision ¶

catalyst.metrics.functional._precision.precision(outputs: torch.Tensor, targets: torch.Tensor, argmax_dim: int = -1, eps: float = 1e-07, num_classes: Optional[int] = None) → Union[float, torch.Tensor][source]¶

Multiclass precision score.

Parameters

outputs – estimated targets as predicted by a model with shape [bs; …, (num_classes or 1)]
targets – ground truth (correct) target values with shape [bs; …, 1]
argmax_dim – int, that specifies dimension for argmax transformation in case of scores/probabilities in outputs
eps – float. Epsilon to avoid zero division.
num_classes – int, that specifies number of classes if it known

Returns

precision for every class

Return type

Tensor

Examples

>>> precision(
>>>     outputs=torch.tensor([
>>>         [1, 0, 0],
>>>         [0, 1, 0],
>>>         [0, 0, 1],
>>>     ]),
>>>     targets=torch.tensor([0, 1, 2]),
>>>     beta=1,
>>> )
tensor([1., 1., 1.])
>>> precision(
>>>     outputs=torch.tensor([[0, 0, 1, 1, 0, 1, 0, 1]]),
>>>     targets=torch.tensor([[0, 1, 0, 1, 0, 0, 1, 1]]),
>>> )
tensor([0.5000, 0.5000]

Recall ¶

catalyst.metrics.functional._recall.recall(outputs: torch.Tensor, targets: torch.Tensor, argmax_dim: int = -1, eps: float = 1e-07, num_classes: Optional[int] = None) → Union[float, torch.Tensor][source]¶

Multiclass recall score.

Parameters

outputs – estimated targets as predicted by a model with shape [bs; …, (num_classes or 1)]
targets – ground truth (correct) target values with shape [bs; …, 1]
argmax_dim – int, that specifies dimension for argmax transformation in case of scores/probabilities in outputs
eps – float. Epsilon to avoid zero division.
num_classes – int, that specifies number of classes if it known

Returns

recall for every class

Return type

Tensor

Examples

>>> recall(
>>>     outputs=torch.tensor([
>>>         [1, 0, 0],
>>>         [0, 1, 0],
>>>         [0, 0, 1],
>>>     ]),
>>>     targets=torch.tensor([0, 1, 2]),
>>> )
tensor([1., 1., 1.])
>>> precision_recall_fbeta_support(
>>>     outputs=torch.tensor([[0, 0, 1, 1, 0, 1, 0, 1]]),
>>>     targets=torch.tensor([[0, 1, 0, 1, 0, 0, 1, 1]]),
>>> )
tensor([0.5000, 0.5000])

Segmentation ¶

catalyst.metrics.functional._segmentation.iou(outputs: torch.Tensor, targets: torch.Tensor, class_dim: int = 1, threshold: float = None, mode: str = 'per-class', weights: Optional[List[float]] = None, eps: float = 1e-07) → torch.Tensor[source]¶

Computes the iou/jaccard score, iou score = intersection / union = tp / (tp + fp + fn)

Parameters

outputs – [N; K; …] tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.
targets – binary [N; K; …] tensor that encodes which of the K classes are associated with the N-th input
class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1), if mode = “micro” means nothing
threshold – threshold for outputs binarization
mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’, ‘per-class’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated per-class and than are averaged over all classes. If mode=’weighted’, metric are calculated per-class and than summed over all classes with weights. If mode=’per-class’, metric are calculated separately for all classes
weights – class weights(for mode=”weighted”)
eps – epsilon to avoid zero division

Returns

IoU (Jaccard) score for each class(if mode=’weighted’) or aggregated IOU

Example

>>> size = 4
>>> half_size = size // 2
>>> shape = (1, 1, size, size)
>>> empty = torch.zeros(shape)
>>> full = torch.ones(shape)
>>> left = torch.ones(shape)
>>> left[:, :, :, half_size:] = 0
>>> right = torch.ones(shape)
>>> right[:, :, :, :half_size] = 0
>>> top_left = torch.zeros(shape)
>>> top_left[:, :, :half_size, :half_size] = 1
>>> pred = torch.cat([empty, left, empty, full, left, top_left], dim=1)
>>> targets = torch.cat([full, right, empty, full, left, left], dim=1)
>>> iou(
>>>     outputs=pred,
>>>     targets=targets,
>>>     class_dim=1,
>>>     threshold=0.5,
>>>     mode="per-class"
>>> )
tensor([0.0000, 0.0000, 1.0000, 1.0000, 1.0000, 0.5])

catalyst.metrics.functional._segmentation.dice(outputs: torch.Tensor, targets: torch.Tensor, class_dim: int = 1, threshold: float = None, mode: str = 'per-class', weights: Optional[List[float]] = None, eps: float = 1e-07) → torch.Tensor[source]¶

Computes the dice score, dice score = 2 * intersection / (intersection + union)) = = 2 * tp / (2 * tp + fp + fn)

Parameters

outputs – [N; K; …] tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.
targets – binary [N; K; …] tensor that encodes which of the K classes are associated with the N-th input
class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1), if mode = “micro” means nothing
threshold – threshold for outputs binarization
mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’, ‘per-class’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated per-class and than are averaged over all classes. If mode=’weighted’, metric are calculated per-class and than summed over all classes with weights. If mode=’per-class’, metric are calculated separately for all classes
weights – class weights(for mode=”weighted”)
eps – epsilon to avoid zero division

Returns

Dice score for each class(if mode=’weighted’) or aggregated Dice

Example

>>> size = 4
>>> half_size = size // 2
>>> shape = (1, 1, size, size)
>>> empty = torch.zeros(shape)
>>> full = torch.ones(shape)
>>> left = torch.ones(shape)
>>> left[:, :, :, half_size:] = 0
>>> right = torch.ones(shape)
>>> right[:, :, :, :half_size] = 0
>>> top_left = torch.zeros(shape)
>>> top_left[:, :, :half_size, :half_size] = 1
>>> pred = torch.cat([empty, left, empty, full, left, top_left], dim=1)
>>> targets = torch.cat([full, right, empty, full, left, left], dim=1)
>>> dice(
>>>      outputs=pred,
>>>      targets=targets,
>>>      class_dim=1,
>>>      threshold=0.5,
>>>      mode="per-class"
>>> )
tensor([0.0000, 0.0000, 1.0000, 1.0000, 1.0000, 0.6667])

catalyst.metrics.functional._segmentation.trevsky(outputs: torch.Tensor, targets: torch.Tensor, alpha: float, beta: Optional[float] = None, class_dim: int = 1, threshold: float = None, mode: str = 'per-class', weights: Optional[List[float]] = None, eps: float = 1e-07) → torch.Tensor[source]¶

Computes the trevsky score, trevsky score = tp / (tp + fp * beta + fn * alpha)

Parameters

outputs – [N; K; …] tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.
targets – binary [N; K; …] tensor that encodes which of the K classes are associated with the N-th input
alpha – false negative coefficient, bigger alpha bigger penalty for false negative. Must be in (0, 1)
beta – false positive coefficient, bigger alpha bigger penalty for false positive. Must be in (0, 1), if None beta = (1 - alpha)
class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1)
threshold – threshold for outputs binarization
mode – class summation strategy. Must be one of [‘micro’, ‘macro’, ‘weighted’, ‘per-class’]. If mode=’micro’, classes are ignored, and metric are calculated generally. If mode=’macro’, metric are calculated per-class and than are averaged over all classes. If mode=’weighted’, metric are calculated per-class and than summed over all classes with weights. If mode=’per-class’, metric are calculated separately for all classes
weights – class weights(for mode=”weighted”)
eps – epsilon to avoid zero division

Returns

Trevsky score for each class(if mode=’weighted’) or aggregated score

Example

>>> size = 4
>>> half_size = size // 2
>>> shape = (1, 1, size, size)
>>> empty = torch.zeros(shape)
>>> full = torch.ones(shape)
>>> left = torch.ones(shape)
>>> left[:, :, :, half_size:] = 0
>>> right = torch.ones(shape)
>>> right[:, :, :, :half_size] = 0
>>> top_left = torch.zeros(shape)
>>> top_left[:, :, :half_size, :half_size] = 1
>>> pred = torch.cat([empty, left, empty, full, left, top_left], dim=1)
>>> targets = torch.cat([full, right, empty, full, left, left], dim=1)
>>> trevsky(
>>>     outputs=pred,
>>>     targets=targets,
>>>     alpha=0.2,
>>>     class_dim=1,
>>>     threshold=0.5,
>>>     mode="per-class"
>>> )
tensor([0.0000, 0.0000, 1.0000, 1.0000, 1.0000, 0.8333])

catalyst.metrics.functional._segmentation.get_segmentation_statistics(outputs: torch.Tensor, targets: torch.Tensor, class_dim: int = 1, threshold: float = None) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶

Computes true positive, false positive, false negative for a multilabel segmentation problem.

Parameters

outputs – [N; K; …] tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.
targets – binary [N; K; …] tensor that encodes which of the K classes are associated with the N-th input
class_dim – indicates class dimention (K) for outputs and targets tensors (default = 1)
threshold – threshold for outputs binarization

Returns

Segmentation stats

Example

>>> size = 4
>>> half_size = size // 2
>>> shape = (1, 1, size, size)
>>> empty = torch.zeros(shape)
>>> full = torch.ones(shape)
>>> left = torch.ones(shape)
>>> left[:, :, :, half_size:] = 0
>>> right = torch.ones(shape)
>>> right[:, :, :, :half_size] = 0
>>> top_left = torch.zeros(shape)
>>> top_left[:, :, :half_size, :half_size] = 1
>>> pred = torch.cat([empty, left, empty, full, left, top_left], dim=1)
>>> targets = torch.cat([full, right, empty, full, left, left], dim=1)
>>> get_segmentation_statistics(
>>>     outputs=pred,
>>>     targets=targets,
>>>     class_dim=1,
>>>     threshold=0.5,
>>> )
(tensor([ 0.,  0.,  0., 16.,  8.,  4.]),
tensor([0., 8., 0., 0., 0., 0.]),
tensor([16.,  8.,  0.,  0.,  0.,  4.]))

Misc ¶

catalyst.metrics.functional._misc.check_consistent_length(*tensors)[source]¶

Check that all arrays have consistent first dimensions. Checks whether all objects in arrays have the same shape or length.

Parameters: tensors – list or tensors of input objects. Objects that will be checked for consistent length.
Raises: ValueError – “Inconsistent numbers of samples”

catalyst.metrics.functional._misc.process_multilabel_components(outputs: torch.Tensor, targets: torch.Tensor, weights: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor][source]¶

General preprocessing for multilabel-based metrics.

Parameters

outputs – NxK tensor that for each of the N examples indicates the probability of the example belonging to each of the K classes, according to the model.
targets – binary NxK tensor that encodes which of the K classes are associated with the N-th input (eg: a row [0, 1, 0, 1] indicates that the example is associated with classes 2 and 4)
weights – importance for each sample

Returns

processed outputs and targets with [batch_size; num_classes] shape

catalyst.metrics.functional._misc.process_recsys_components(outputs: torch.Tensor, targets: torch.Tensor) → torch.Tensor[source]¶

General pre-processing for calculation recsys metrics

Parameters

outputs (torch.Tensor) – Tensor with predicted scores size: [batch_size, slate_length] model outputs, logits
targets (torch.Tensor) – Binary tensor with ground truth. 1 means the item is relevant for the user and 0 not relevant size: [batch_szie, slate_length] ground truth, labels

Returns

targets tensor sorted by outputs

Return type

targets_sorted_by_outputs (torch.Tensor)

catalyst.metrics.functional._misc.get_binary_statistics(outputs: torch.Tensor, targets: torch.Tensor, label: int = 1) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]¶

Computes the number of true negative, false positive, false negative, true positive and support for a binary classification problem for a given label.

Parameters

outputs – estimated targets as predicted by a model with shape [bs; …, 1]
targets – ground truth (correct) target values with shape [bs; …, 1]
label – integer, that specifies label of interest for statistics compute

Returns

stats

Return type

Tuple[Tensor, Tensor, Tensor, Tensor, Tensor]

Example

>>> y_pred = torch.tensor([[0, 0, 1, 1, 0, 1, 0, 1]])
>>> y_true = torch.tensor([[0, 1, 0, 1, 0, 0, 1, 1]])
>>> tn, fp, fn, tp, support = get_binary_statistics(y_pred, y_true)
tensor(2) tensor(2) tensor(2) tensor(2) tensor(4)

catalyst.metrics.functional._misc.get_multiclass_statistics(outputs: torch.Tensor, targets: torch.Tensor, argmax_dim: int = -1, num_classes: Optional[int] = None) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]¶

Computes the number of true negative, false positive, false negative, true positive and support for a multiclass classification problem.

Parameters

outputs – estimated targets as predicted by a model with shape [bs; …, (num_classes or 1)]
targets – ground truth (correct) target values with shape [bs; …, 1]
argmax_dim – int, that specifies dimension for argmax transformation in case of scores/probabilities in outputs
num_classes – int, that specifies number of classes if it known

Returns

stats

Return type

Tuple[Tensor, Tensor, Tensor, Tensor, Tensor]

Example

>>> y_pred = torch.tensor([1, 2, 3, 0])
>>> y_true = torch.tensor([1, 3, 4, 0])
>>> tn, fp, fn, tp, support = get_multiclass_statistics(y_pred, y_true)
tensor([3., 3., 3., 2., 3.]), tensor([0., 0., 1., 1., 0.]),
tensor([0., 0., 0., 1., 1.]), tensor([1., 1., 0., 0., 0.]),
tensor([1., 1., 0., 1., 1.])

catalyst.metrics.functional._misc.get_multilabel_statistics(outputs: torch.Tensor, targets: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]¶

Computes the number of true negative, false positive, false negative, true positive and support for a multilabel classification problem.

Parameters

outputs – estimated targets as predicted by a model with shape [bs; …, (num_classes or 1)]
targets – ground truth (correct) target values with shape [bs; …, 1]

Returns

stats

Return type

Tuple[Tensor, Tensor, Tensor, Tensor, Tensor]

Example

>>> y_pred = torch.tensor([[0, 0, 1, 1], [0, 1, 0, 1]])
>>> y_true = torch.tensor([[0, 1, 0, 1], [0, 0, 1, 1]])
>>> tn, fp, fn, tp, support = get_multilabel_statistics(y_pred, y_true)
tensor([2., 0., 0., 0.]) tensor([0., 1., 1., 0.]),
tensor([0., 1., 1., 0.]) tensor([0., 0., 0., 2.]),
tensor([0., 1., 1., 2.])

>>> y_pred = torch.tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
>>> y_true = torch.tensor([0, 1, 2])
>>> tn, fp, fn, tp, support = get_multilabel_statistics(y_pred, y_true)
tensor([2., 2., 2.]) tensor([0., 0., 0.])
tensor([0., 0., 0.]) tensor([1., 1., 1.])
tensor([1., 1., 1.])

>>> y_pred = torch.tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
>>> y_true = torch.nn.functional.one_hot(torch.tensor([0, 1, 2]))
>>> tn, fp, fn, tp, support = get_multilabel_statistics(y_pred, y_true)
tensor([2., 2., 2.]) tensor([0., 0., 0.])
tensor([0., 0., 0.]) tensor([1., 1., 1.])
tensor([1., 1., 1.])

catalyst.metrics.functional._misc.get_default_topk_args(num_classes: int) → Sequence[int][source]¶

Calculate list params for Accuracy@k and mAP@k.

Parameters: num_classes – number of classes
Returns: array of accuracy arguments
Return type: iterable

Examples

>>> get_default_topk_args(num_classes=4)
[1, 3]

>>> get_default_topk_args(num_classes=8)
[1, 3, 5]