Utils¶

Config
Distributed
Misc
Onnx
Pruning
Quantization
Torch
Tracing
Image (contrib)
Report (contrib)
Thresholds (contrib)
Visualization (contrib)

Config ¶

catalyst.utils.config.load_config(path: Union[str, pathlib.Path], ordered: bool = False, data_format: Optional[str] = None, encoding: str = 'utf-8') → Union[Dict, List][source]¶

Loads config by giving path. Supports YAML and JSON files.

Examples

>>> load(path="./config.yml", ordered=True)

Parameters

path – path to config file (YAML or JSON)
ordered – if true the config will be loaded as OrderedDict
data_format – yaml, yml or json.
encoding – encoding to read the config

Returns

config

Return type

Union[Dict, List]

Raises

ValueError – if path path doesn’t exists or file format is not YAML or JSON

catalyst.utils.config.save_config(config: Union[Dict, List], path: Union[str, pathlib.Path], data_format: Optional[str] = None, encoding: str = 'utf-8', ensure_ascii: bool = False, indent: int = 2) → None[source]¶

Saves config to file. Path must be either YAML or JSON.

Parameters

config (Union[Dict, List]) – config to save
path (Union[str, Path]) – path to save
data_format – yaml, yml or json.
encoding – Encoding to write file. Default is utf-8
ensure_ascii – Used for JSON, if True non-ASCII
strings. (characters are escaped in JSON) –
indent – Used for JSON

Distributed ¶

catalyst.utils.distributed.all_gather(data: Any) → List[Any][source]¶

Run all_gather on arbitrary picklable data (not necessarily tensors).

Note

if data on different devices then data in resulted list will be on the same devices. Source: http://github.com/facebookresearch/detr/blob/master/util/misc.py#L88-L128

Parameters: data – any picklable object
Returns: list of data gathered from each process.

catalyst.utils.distributed.ddp_reduce(tensor: torch.Tensor, mode: str, world_size: int)[source]¶

Syncs tensor over world_size in distributed mode.

Parameters

tensor – tensor to sync across the processes.
mode – tensor synchronization type, should be one of ‘sum’, ‘mean’ or ‘all’.
world_size – world size

Returns

torch.Tensor with synchronized values.

Raises

ValueError – if mode is out of sum, mean, all.

catalyst.utils.distributed.get_backend() → Optional[str][source]¶: Returns the backend for distributed training.

catalyst.utils.distributed.get_nn_from_ddp_module(model: torch.nn.modules.module.Module) → torch.nn.modules.module.Module[source]¶

Return a real model from a torch.nn.DataParallel, torch.nn.parallel.DistributedDataParallel, or apex.parallel.DistributedDataParallel.

Parameters: model – A model, or DataParallel wrapper.
Returns: A model

catalyst.utils.distributed.get_rank() → int[source]¶

Returns the rank of the current worker.

Returns: rank if torch.distributed is initialized, otherwise -1
Return type: int

catalyst.utils.distributed.get_world_size() → int[source]¶: Returns the world size for distributed training.

catalyst.utils.distributed.mean_reduce(tensor: torch.Tensor, world_size: int) → torch.Tensor[source]¶

Reduce tensor to all processes and compute mean value.

Parameters

tensor – tensor to reduce.
world_size – number of processes in DDP setup.

Returns

reduced tensor

catalyst.utils.distributed.sum_reduce(tensor: torch.Tensor) → torch.Tensor[source]¶

Reduce tensor to all processes and compute total (sum) value.

Parameters: tensor – tensor to reduce.
Returns: reduced tensor

Misc ¶

catalyst.utils.misc.boolean_flag(parser: argparse.ArgumentParser, name: str, default: Optional[bool] = False, help: Optional[str] = None, shorthand: Optional[str] = None) → None[source]¶

Add a boolean flag to a parser inplace.

Examples

>>> parser = argparse.ArgumentParser()
>>> boolean_flag(
>>>     parser, "flag", default=False, help="some flag", shorthand="f"
>>> )

Parameters

parser – parser to add the flag to
name – argument name –<name> will enable the flag, while –no-<name> will disable it
default (bool, optional) – default value of the flag
help – help string for the flag
shorthand – shorthand string for the argument

catalyst.utils.misc.flatten_dict(dictionary: Dict[str, Any], parent_key: str = '', separator: str = '/') → collections.OrderedDict[source]¶

Make the given dictionary flatten.

Parameters

dictionary – giving dictionary
parent_key (str, optional) – prefix nested keys with string parent_key
separator (str, optional) – delimiter between parent_key and key to use

Returns

ordered dictionary with flatten keys

Return type

collections.OrderedDict

catalyst.utils.misc.get_attr(obj: Any, key: str, inner_key: Optional[str] = None) → Any[source]¶

Alias for python getattr method. Useful for Callbacks preparation and cases with multi-criterion, multi-optimizer setup. For example, when you would like to train multi-task classification.

Used to get a named attribute from a IRunner by key keyword; for example

get_attr(runner, "criterion")
# is equivalent to
runner.criterion

get_attr(runner, "optimizer")
# is equivalent to
runner.optimizer

get_attr(runner, "scheduler")
# is equivalent to
runner.scheduler

With inner_key usage, it suppose to find a dictionary under key and would get inner_key from this dict; for example,

get_attr(runner, "criterion", "bce")
# is equivalent to
runner.criterion["bce"]

get_attr(runner, "optimizer", "adam")
# is equivalent to
runner.optimizer["adam"]

get_attr(runner, "scheduler", "adam")
# is equivalent to
runner.scheduler["adam"]

Parameters

obj – object of interest
key – name for attribute of interest, like criterion, optimizer, scheduler
inner_key – name of inner dictionary key

Returns

inner attribute

catalyst.utils.misc.get_by_keys(dict_: dict, *keys: Any, default: Optional[catalyst.utils.misc.T] = None) → catalyst.utils.misc.T[source]¶: Docs.

catalyst.utils.misc.get_hash(obj: Any) → str[source]¶

Creates unique hash from object following way: - Represent obj as sting recursively - Hash this string with sha256 hash function - encode hash with url-safe base64 encoding

Parameters: obj – object to hash
Returns: base64-encoded string

catalyst.utils.misc.get_short_hash(obj) → str[source]¶

Creates unique short hash from object.

Parameters: obj – object to hash
Returns: short base64-encoded string (6 chars)

catalyst.utils.misc.get_utcnow_time(format: Optional[str] = None) → str[source]¶

Return string with current utc time in chosen format.

Parameters: format – format string. if None “%y%m%d.%H%M%S” will be used.
Returns: formatted utc time string
Return type: str

catalyst.utils.misc.make_tuple(tuple_like)[source]¶

Creates a tuple if given tuple_like value isn’t list or tuple.

Parameters: tuple_like – tuple like object - list or tuple
Returns: tuple or list

catalyst.utils.misc.maybe_recursive_call(object_or_dict, method: Union[str, Callable], recursive_args=None, recursive_kwargs=None, **kwargs)[source]¶

Calls the method recursively for the object_or_dict.

Parameters

object_or_dict – some object or a dictionary of objects
method – method name to call
recursive_args – list of arguments to pass to the method
recursive_kwargs – list of key-arguments to pass to the method
**kwargs – Arbitrary keyword arguments

Returns

result of method call

catalyst.utils.misc.merge_dicts(*dicts: dict) → dict[source]¶

Recursive dict merge. Instead of updating only top-level keys, merge_dicts recurses down into dicts nested to an arbitrary depth, updating keys.

Parameters: *dicts – several dictionaries to merge
Returns: deep-merged dictionary
Return type: dict

catalyst.utils.misc.pairwise(iterable: Iterable[catalyst.utils.misc.T]) → Iterable[Tuple[catalyst.utils.misc.T, catalyst.utils.misc.T]][source]¶

Iterate sequences by pairs.

Examples

>>> for i in pairwise([1, 2, 5, -3]):
>>>     print(i)
(1, 2)
(2, 5)
(5, -3)

Parameters: iterable – Any iterable sequence
Returns: pairwise iterator

catalyst.utils.misc.set_global_seed(seed: int) → None[source]¶

Sets random seed into Numpy and Random, PyTorch and TensorFlow.

Parameters: seed – random seed

Onnx ¶

catalyst.utils.onnx.onnx_export(model: torch.nn.modules.module.Module, batch: torch.Tensor, file: str, method_name: str = 'forward', input_names: Iterable = None, output_names: List[str] = None, dynamic_axes: Union[Dict[str, int], Dict[str, Dict[str, int]]] = None, opset_version: int = 9, do_constant_folding: bool = False, return_model: bool = False, verbose: bool = False) → Union[None, onnx][source]¶

Converts model to onnx runtime.

Parameters

model – model
batch – inputs
file – file to save. Defaults to “model.onnx”.
method_name – Forward pass method to be converted. Defaults to “forward”.
input_names – name of inputs in graph. Defaults to None.
output_names – name of outputs in graph. Defaults to None.
dynamic_axes – axes with dynamic shapes. Defaults to None.
opset_version – Defaults to 9.
do_constant_folding – If True, the constant-folding optimization is applied to the model during export. Defaults to False.
return_model – If True then returns onnxruntime model (onnx required). Defaults to False.
verbose – if specified, we will print out a debug description of the trace being exported.

Example

import torch

from catalyst.utils import convert_to_onnx

class LinModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.lin1 = torch.nn.Linear(10, 10)
        self.lin2 = torch.nn.Linear(2, 10)

    def forward(self, inp_1, inp_2):
        return self.lin1(inp_1), self.lin2(inp_2)

    def first_only(self, inp_1):
        return self.lin1(inp_1)

lin_model = LinModel()
convert_to_onnx(
    model, batch=torch.randn((1, 10)),
    file="model.onnx",
    method_name="first_only"
)

Raises: ImportError – when return_model is True, but onnx is not installed.
Returns: onnx model if return_model set to True.
Return type: Union[None, “onnx”]

catalyst.utils.onnx.quantize_onnx_model(onnx_model_path: Union[pathlib.Path, str], quantized_model_path: Union[pathlib.Path, str], qtype: str = 'qint8', verbose: bool = False) → None[source]¶

Takes model converted to onnx runtime and applies pruning.

Parameters

onnx_model_path – path to onnx model.
quantized_model_path – path to quantized model.
qtype – Type of weights in quantized model. Can be quint8 or qint8. Defaults to “qint8”.
verbose – If set to True prints model size before and after quantization. Defaults to False.

Raises

ValueError – If qtype is not understood.

Pruning ¶

catalyst.utils.pruning.get_pruning_fn(pruning_fn: Union[str, Callable], dim: Optional[int] = None, l_norm: Optional[int] = None) → Callable[source]¶

[summary]

Parameters

pruning_fn (Union[str, Callable]) – function from torch.nn.utils.prune module or your based on BasePruningMethod. Can be string e.g. “l1_unstructured”. See pytorch docs for more details.
dim (int, optional) – if you are using structured pruning method you need to specify dimension. Defaults to None.
l_norm (int, optional) – if you are using ln_structured you need to specify l_norm. Defaults to None.

Raises

ValueError – If dim or l_norm is not defined when it’s required.

Returns

pruning_fn

Return type

Callable

catalyst.utils.pruning.prune_model(model: torch.nn.modules.module.Module, pruning_fn: Union[Callable, str], amount: Union[float, int], keys_to_prune: Optional[List[str]] = None, layers_to_prune: Optional[List[str]] = None, dim: Optional[int] = None, l_norm: Optional[int] = None) → None[source]¶

Prune model function can be used for pruning certain tensors in model layers.

Parameters

model – Model to be pruned.
pruning_fn – Pruning function with API same as in torch.nn.utils.pruning. pruning_fn(module, name, amount).
keys_to_prune – list of strings. Determines which tensor in modules will be pruned.
amount – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.
layers_to_prune – list of strings - module names to be pruned. If None provided then will try to prune every module in model.
dim (int, optional) – if you are using structured pruning method you need to specify dimension. Defaults to None.
l_norm (int, optional) – if you are using ln_structured you need to specify l_norm. Defaults to None.

Example

pruned_model = prune_model(model, pruning_fn="l1_unstructured")

Raises

AttributeError – If layers_to_prune is not None, but there is no layers with specified name. OR
ValueError – if no layers have specified keys.

catalyst.utils.pruning.remove_reparametrization(model: torch.nn.modules.module.Module, keys_to_prune: List[str], layers_to_prune: Optional[List[str]] = None) → None[source]¶

Removes pre-hooks and pruning masks from the model.

Parameters

model – model to remove reparametrization.
keys_to_prune – list of strings. Determines which tensor in modules have already been pruned.
layers_to_prune – list of strings - module names have already been pruned. If None provided then will try to prune every module in model.

Quantization ¶

catalyst.utils.quantization.quantize_model(model: torch.nn.modules.module.Module, qconfig_spec: Optional[Dict] = None, dtype: Optional[Union[str, torch.dtype]] = 'qint8') → torch.nn.modules.module.Module[source]¶

Function to quantize model weights.

Parameters

model – model to be quantized
qconfig_spec (Dict, optional) – quantization config in PyTorch format. Defaults to None.
dtype – Type of weights after quantization. Defaults to “qint8”.

Returns

quantized model

Return type

Model

Torch ¶

catalyst.utils.torch.any2device(value: Union[Dict, List, Tuple, numpy.ndarray, torch.Tensor, torch.nn.modules.module.Module], device: Union[str, torch.device]) → Union[Dict, List, Tuple, torch.Tensor, torch.nn.modules.module.Module][source]¶

Move tensor, list of tensors, list of list of tensors, dict of tensors, tuple of tensors to target device.

Parameters

value – Object to be moved
device – target device ids

Returns

Same structure as value, but all tensors and np.arrays moved to device

catalyst.utils.torch.get_available_engine(cpu: bool = False, fp16: bool = False, ddp: bool = False) → Engine[source]¶

Returns available engine based on given arguments.

Parameters

cpu (bool) – option to use cpu for training. Default is False.
ddp (bool) – option to use DDP for training. Default is False.
fp16 (bool) – option to use APEX for training. Default is False.

Returns

Engine which match requirements.

catalyst.utils.torch.get_available_gpus()[source]¶

Array of available GPU ids.

Examples

>>> os.environ["CUDA_VISIBLE_DEVICES"] = "0,2"
>>> get_available_gpus()
[0, 2]

>>> os.environ["CUDA_VISIBLE_DEVICES"] = "0,-1,1"
>>> get_available_gpus()
[0]

>>> os.environ["CUDA_VISIBLE_DEVICES"] = ""
>>> get_available_gpus()
[]

>>> os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
>>> get_available_gpus()
[]

Returns: available GPU ids
Return type: iterable

catalyst.utils.torch.get_device() → torch.device[source]¶: Simple returning the best available device (TPU > GPU > CPU).

catalyst.utils.torch.get_optimizer_momentum(optimizer: torch.optim.optimizer.Optimizer) → float[source]¶

Get momentum of current optimizer.

Parameters: optimizer – PyTorch optimizer
Returns: momentum at first param group
Return type: float

catalyst.utils.torch.get_optimizer_momentum_list(optimizer: torch.optim.optimizer.Optimizer) → List[Optional[float]][source]¶

Get list of optimizer momentums (for each param group)

Parameters: optimizer – PyTorch optimizer
Returns: momentum for each param group
Return type: momentum_list (List[Union[float, None]])

catalyst.utils.torch.get_requires_grad(model: torch.nn.modules.module.Module)[source]¶

Gets the requires_grad value for all model parameters.

Example:

>>> model = SimpleModel()
>>> requires_grad = get_requires_grad(model)

Parameters: model – model
Returns: value
Return type: requires_grad

catalyst.utils.torch.mixup_batch(batch: List[torch.Tensor], alpha: float = 0.2, mode: str = 'replace') → List[torch.Tensor][source]¶

Parameters

batch – batch to which you want to apply augmentation
alpha – beta distribution a=b parameters. Must be >=0. The closer alpha to zero the less effect of the mixup.
mode – algorithm used for muxup: "replace" | "add". If “replace” then replaces the batch with a mixed one, while the batch size is not changed If “add”, concatenates mixed examples to the current ones, the batch size increases by 2 times.

Returns

augmented batch

catalyst.utils.torch.prepare_cudnn(deterministic: Optional[bool] = None, benchmark: Optional[bool] = None) → None[source]¶

Prepares CuDNN benchmark and sets CuDNN to be deterministic/non-deterministic mode

Parameters

deterministic – deterministic mode if running in CuDNN backend.
benchmark – If True use CuDNN heuristics to figure out which algorithm will be most performant for your model architecture and input. Setting it to False may slow down your training.

catalyst.utils.torch.set_optimizer_momentum(optimizer: torch.optim.optimizer.Optimizer, value: float, index: int = 0)[source]¶

Set momentum of index ‘th param group of optimizer to value.

Parameters

optimizer – PyTorch optimizer
value – new value of momentum
index (int, optional) – integer index of optimizer’s param groups, default is 0

catalyst.utils.torch.set_requires_grad(model: torch.nn.modules.module.Module, requires_grad: Union[bool, Dict[str, bool]])[source]¶

Sets the requires_grad value for all model parameters.

Example:

>>> model = SimpleModel()
>>> set_requires_grad(model, requires_grad=True)
>>> # or
>>> model = SimpleModel()
>>> set_requires_grad(model, requires_grad={""})

Parameters

model – model
requires_grad – value

catalyst.utils.torch.soft_update(target: torch.nn.modules.module.Module, source: torch.nn.modules.module.Module, tau: float) → None[source]¶

Updates the target data with the source one smoothing by tau (inplace operation).

Parameters

target – nn.Module to update
source – nn.Module for updating
tau – smoothing parametr

Tracing ¶

catalyst.utils.tracing.trace_model(model: torch.nn.modules.module.Module, batch: Union[Tuple[torch.Tensor], torch.Tensor], method_name: str = 'forward') → torch.jit._script.ScriptModule[source]¶

Traces model using runner and batch.

Parameters

model – Model to trace
batch – Batch to trace the model
method_name – Model’s method name that will be used as entrypoint during tracing

Example

import torch

from catalyst.utils import trace_model

class LinModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.lin1 = torch.nn.Linear(10, 10)
        self.lin2 = torch.nn.Linear(2, 10)

    def forward(self, inp_1, inp_2):
        return self.lin1(inp_1), self.lin2(inp_2)

    def first_only(self, inp_1):
        return self.lin1(inp_1)

lin_model = LinModel()
traced_model = trace_model(
    lin_model, batch=torch.randn(1, 10), method_name="first_only"
)

Returns: Traced model
Return type: jit.ScriptModule

Image (contrib)¶

catalyst.contrib.utils.image.has_image_extension(uri) → bool[source]¶

Check that file has image extension.

Parameters: uri (Union[str, pathlib.Path]) – the resource to load the file from
Returns: True if file has image extension, False otherwise
Return type: bool

catalyst.contrib.utils.image.imread(uri, grayscale: bool = False, expand_dims: bool = True, rootpath: Optional[Union[str, pathlib.Path]] = None, **kwargs) → numpy.ndarray[source]¶

Reads an image from the specified file.

Parameters

uri (str, pathlib.Path, bytes, file) – the resource to load the image from, e.g. a filename, pathlib.Path, http address or file object, see imageio.imread docs for more info
grayscale – if True, make all images grayscale
expand_dims – if True, append channel axis to grayscale images rootpath (Union[str, pathlib.Path]): path to the resource with image (allows to use relative path)
rootpath (Union[str, pathlib.Path]) – path to the resource with image (allows to use relative path)
**kwargs – extra params for image read

Returns

image

Return type

np.ndarray

catalyst.contrib.utils.image.imsave(**kwargs)[source]¶

imwrite(uri, im, format=None, **kwargs)

Write an image to the specified file. Alias for imageio.imsave.

Parameters: **kwargs – parameters for imageio.imsave
Returns: image save result

catalyst.contrib.utils.image.imwrite(**kwargs)[source]¶

imwrite(uri, im, format=None, **kwargs)

Write an image to the specified file. Alias for imageio.imwrite.

Parameters: **kwargs – parameters for imageio.imwrite
Returns: image save result

catalyst.contrib.utils.image.mimread(uri, clip_range: Optional[Tuple[int, int]] = None, expand_dims: bool = True, rootpath: Optional[Union[str, pathlib.Path]] = None, **kwargs) → numpy.ndarray[source]¶

Reads multiple images from the specified file.

Parameters

uri (str, pathlib.Path, bytes, file) – the resource to load the image from, e.g. a filename, pathlib.Path, http address or file object, see imageio.mimread docs for more info
clip_range (Tuple[int, int]) – lower and upper interval edges, image values outside the interval are clipped to the interval edges
expand_dims – if True, append channel axis to grayscale images rootpath (Union[str, pathlib.Path]): path to the resource with image (allows to use relative path)
rootpath (Union[str, pathlib.Path]) – path to the resource with image (allows to use relative path)
**kwargs – extra params for image read

Returns

image

Return type

np.ndarray

Report (contrib)¶

catalyst.contrib.utils.report.get_classification_report(y_true: numpy.ndarray, y_pred: numpy.ndarray, y_scores: Optional[numpy.ndarray] = None, beta: Optional[float] = None) → pandas.core.frame.DataFrame[source]¶

Generates pandas-based per-class and aggregated classification metrics.

Parameters

y_true (np.ndarray) – ground truth labels
y_pred (np.ndarray) – predicted model labels
y_scores (np.ndarray) – predicted model scores. Defaults to None.
beta (float, optional) – Beta parameter for custom Fbeta score computation. Defaults to None.

Returns

pandas dataframe with main classification metrics.

Return type

pd.DataFrame

Examples:

from sklearn import datasets, linear_model, metrics
from sklearn.model_selection import train_test_split
from catalyst import utils

digits = datasets.load_digits()

# flatten the images
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))

# Create a classifier
clf = linear_model.LogisticRegression(multi_class="ovr")

# Split data into 50% train and 50% test subsets
X_train, X_test, y_train, y_test = train_test_split(
    data, digits.target, test_size=0.5, shuffle=False)

# Learn the digits on the train subset
clf.fit(X_train, y_train)

# Predict the value of the digit on the test subset
y_scores = clf.predict_proba(X_test)
y_pred = clf.predict(X_test)

utils.get_classification_report(
    y_true=y_test,
    y_pred=y_pred,
    y_scores=y_scores,
    beta=0.5
)

Thresholds (contrib)¶

catalyst.contrib.utils.thresholds.get_baseline_thresholds(scores: numpy.ndarray, labels: numpy.ndarray, objective: Callable[[numpy.ndarray, numpy.ndarray], float]) → Tuple[float, List[float]][source]¶

Returns baseline thresholds for multiclass/multilabel classification.

Parameters

scores – estimated per-class scores/probabilities predicted by the model, numpy array with shape [num_examples, num_classes]
labels – ground truth labels, numpy array with shape [num_examples, num_classes]
objective – callable function, metric which we want to maximize

Returns

tuple with best found objective score and per-class thresholds

catalyst.contrib.utils.thresholds.get_best_multiclass_thresholds(scores: numpy.ndarray, labels: numpy.ndarray, objective: Callable[[numpy.ndarray, numpy.ndarray], float]) → Tuple[float, List[float]][source]¶

Finds best thresholds for multiclass classification task.

Parameters

scores – estimated per-class scores/probabilities predicted by the model
labels – ground truth labels
objective – callable function, metric which we want to maximize

Returns

tuple with best found objective score and per-class thresholds

catalyst.contrib.utils.thresholds.get_best_multilabel_thresholds(scores: numpy.ndarray, labels: numpy.ndarray, objective: Callable[[numpy.ndarray, numpy.ndarray], float]) → Tuple[float, List[float]][source]¶

Finds best thresholds for multilabel classification task.

Parameters

scores – estimated per-class scores/probabilities predicted by the model
labels – ground truth labels
objective – callable function, metric which we want to maximize

Returns

tuple with best found objective score and per-class thresholds

catalyst.contrib.utils.thresholds.get_binary_threshold(scores: numpy.ndarray, labels: numpy.ndarray, objective: Callable[[numpy.ndarray, numpy.ndarray], float], num_thresholds: int = 100) → Tuple[float, float][source]¶

Finds best threshold for binary classification task based on cross-validation estimates.

Parameters

scores – estimated per-class scores/probabilities predicted by the model, numpy array with shape [num_examples, ]
labels – ground truth labels, numpy array with shape [num_examples, ]
objective – callable function, metric which we want to maximize
num_thresholds – number of thresholds ot try for each class

Returns

tuple with best found objective score and threshold

catalyst.contrib.utils.thresholds.get_binary_threshold_cv(scores: numpy.ndarray, labels: numpy.ndarray, objective: Callable[[numpy.ndarray, numpy.ndarray], float], num_splits: int = 5, num_repeats: int = 1, random_state: int = 42)[source]¶

Finds best threshold for binary classification task based on cross-validation estimates.

Parameters

scores – estimated per-class scores/probabilities predicted by the model, numpy array with shape [num_examples, ]
labels – ground truth labels, numpy array with shape [num_examples, ]
objective – callable function, metric which we want to maximize
num_splits – number of splits to use for cross-validation
num_repeats – number of repeats to use for cross-validation
random_state – random state to use for cross-validation

Returns

tuple with best found objective score and threshold

catalyst.contrib.utils.thresholds.get_multiclass_thresholds(scores: numpy.ndarray, labels: numpy.ndarray, objective: Callable[[numpy.ndarray, numpy.ndarray], float]) → Tuple[List[float], List[float]][source]¶

Finds best thresholds for multiclass classification task.

Parameters

scores – estimated per-class scores/probabilities predicted by the model, numpy array with shape [num_examples, num_classes]
labels – ground truth labels, numpy array with shape [num_examples, num_classes]
objective – callable function, metric which we want to maximize

Returns

tuple with best found objective score and per-class thresholds

catalyst.contrib.utils.thresholds.get_multiclass_thresholds_greedy(scores: numpy.ndarray, labels: numpy.ndarray, objective: Callable[[numpy.ndarray, numpy.ndarray], float], num_iterations: int = 100, num_thresholds: int = 100, thresholds: Optional[numpy.ndarray] = None, patience: int = 3, atol: float = 0.01) → Tuple[float, List[float]][source]¶

Finds best thresholds for multiclass classification task with brute-force algorithm.

Parameters

scores – estimated per-class scores/probabilities predicted by the model
labels – ground truth labels
objective – callable function, metric which we want to maximize
num_iterations – number of iteration for brute-force algorithm
num_thresholds – number of thresholds ot try for each class
thresholds – baseline thresholds, which we want to optimize
patience – maximum number of iteration before early stop exit
atol – minimum required improvement per iteration for early stop exit

Returns

tuple with best found objective score and per-class thresholds

catalyst.contrib.utils.thresholds.get_multilabel_thresholds(scores: numpy.ndarray, labels: numpy.ndarray, objective: Callable[[numpy.ndarray, numpy.ndarray], float])[source]¶

Finds best thresholds for multilabel classification task.

Parameters

scores – estimated per-class scores/probabilities predicted by the model, numpy array with shape [num_examples, num_classes]
labels – ground truth labels, numpy array with shape [num_examples, num_classes]
objective – callable function, metric which we want to maximize

Returns

tuple with best found objective score and per-class thresholds

catalyst.contrib.utils.thresholds.get_multilabel_thresholds_cv(scores: numpy.ndarray, labels: numpy.ndarray, objective: Callable[[numpy.ndarray, numpy.ndarray], float], num_splits: int = 5, num_repeats: int = 1, random_state: int = 42)[source]¶

Finds best thresholds for multilabel classification task based on cross-validation estimates.

Parameters

scores – estimated per-class scores/probabilities predicted by the model, numpy array with shape [num_examples, num_classes]
labels – ground truth labels, numpy array with shape [num_examples, num_classes]
objective – callable function, metric which we want to maximize
num_splits – number of splits to use for cross-validation
num_repeats – number of repeats to use for cross-validation
random_state – random state to use for cross-validation

Returns

tuple with best found objective score and per-class thresholds

catalyst.contrib.utils.thresholds.get_multilabel_thresholds_greedy(scores: numpy.ndarray, labels: numpy.ndarray, objective: Callable[[numpy.ndarray, numpy.ndarray], float], num_iterations: int = 100, num_thresholds: int = 100, thresholds: Optional[numpy.ndarray] = None, patience: int = 3, atol: float = 0.01) → Tuple[float, List[float]][source]¶

Finds best thresholds for multilabel classification task with brute-force algorithm.

Parameters

scores – estimated per-class scores/probabilities predicted by the model
labels – ground truth labels
objective – callable function, metric which we want to maximize
num_iterations – number of iteration for brute-force algorithm
num_thresholds – number of thresholds ot try for each class
thresholds – baseline thresholds, which we want to optimize
patience – maximum number of iteration before early stop exit
atol – minimum required improvement per iteration for early stop exit

Returns

tuple with best found objective score and per-class thresholds

catalyst.contrib.utils.thresholds.get_thresholds_greedy(scores: numpy.ndarray, labels: numpy.ndarray, score_fn: Callable, num_iterations: int = 100, num_thresholds: int = 100, thresholds: Optional[numpy.ndarray] = None, patience: int = 3, atol: float = 0.01) → Tuple[float, List[float]][source]¶

Finds best thresholds for classification task with brute-force algorithm.

Parameters

scores – estimated per-class scores/probabilities predicted by the model
labels – ground truth labels
score_fn – callable function, based on (scores, labels, thresholds)
num_iterations – number of iteration for brute-force algorithm
num_thresholds – number of thresholds ot try for each class
thresholds – baseline thresholds, which we want to optimize
patience – maximum number of iteration before early stop exit
atol – minimum required improvement per iteration for early stop exit

Returns

tuple with best found objective score and per-class thresholds

Visualization (contrib)¶

catalyst.contrib.utils.visualization.plot_confusion_matrix(cm: numpy.ndarray, class_names=None, normalize=False, title='confusion matrix', fname=None, show=True, figsize=12, fontsize=32, colormap='Blues')[source]¶

Render the confusion matrix and return matplotlib”s figure with it. Normalization can be applied by setting normalize=True.

Parameters

cm – numpy confusion matrix
class_names – class names
normalize – boolean flag to normalize confusion matrix
title – title
fname – filename to save confusion matrix
show – boolean flag for preview
figsize – matplotlib figure size
fontsize – matplotlib font size
colormap – matplotlib color map

Returns

matplotlib figure

catalyst.contrib.utils.visualization.render_figure_to_array(figure)[source]¶: Renders matplotlib”s figure to tensor.