Shortcuts

Utilities

Main

All utils are gathered in catalyst.utils for easier access.

Note

Everything from catalyst.contrib.utils is included in catalyst.utils

Checkpoint

catalyst.utils.checkpoint.pack_checkpoint(model: torch.nn.modules.module.Module = None, criterion: torch.nn.modules.module.Module = None, optimizer=None, scheduler=None, **kwargs)[source]

Packs model, criterion, optimizer, scheduler and some extra info **kwargs to torch-based checkpoint.

Parameters
  • model – torch model

  • criterion – torch criterion

  • optimizer – torch optimizer

  • scheduler – torch scheduler

  • **kwargs – some extra info to pack

Returns

torch-based checkpoint with model_state_dict, criterion_state_dict, optimizer_state_dict, scheduler_state_dict keys.

catalyst.utils.checkpoint.unpack_checkpoint(checkpoint, model=None, criterion=None, optimizer=None, scheduler=None) → None[source]

Load checkpoint from file and unpack the content to a model (if not None), criterion (if not None), optimizer (if not None), scheduler (if not None).

Parameters
  • checkpoint – checkpoint to load

  • model – model where should be updated state

  • criterion – criterion where should be updated state

  • optimizer – optimizer where should be updated state

  • scheduler – scheduler where should be updated state

catalyst.utils.checkpoint.save_checkpoint(checkpoint: Dict, logdir: Union[pathlib.Path, str], suffix: str, is_best: bool = False, is_last: bool = False, special_suffix: str = '', saver_fn: Callable = <function save>) → Union[pathlib.Path, str][source]

Saving checkpoint to a file.

Parameters
  • checkpoint – data to save.

  • logdir – directory where checkpoint should be stored.

  • suffix – checkpoint file name.

  • is_best – if True then also will be generated best checkpoint file.

  • is_last – if True then also will be generated last checkpoint file.

  • special_suffix – suffix to use for saving best/last checkpoints.

  • saver_fn – function to use for saving data to file, default is torch.save

Returns

path to saved checkpoint

catalyst.utils.checkpoint.load_checkpoint(filepath: str)[source]

Load checkpoint from path.

Parameters

filepath – checkpoint file to load

Returns

checkpoint content

Components

catalyst.utils.components.process_components(model: Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]], criterion: torch.nn.modules.module.Module = None, optimizer: torch.optim.optimizer.Optimizer = None, scheduler: torch.optim.lr_scheduler._LRScheduler = None, distributed_params: Dict = None, device: Union[str, torch.device] = None) → Tuple[Union[torch.nn.modules.module.Module, Dict[str, torch.nn.modules.module.Module]], torch.nn.modules.module.Module, torch.optim.optimizer.Optimizer, torch.optim.lr_scheduler._LRScheduler, Union[str, torch.device]][source]

Returns the processed model, criterion, optimizer, scheduler and device.

Parameters
  • model – torch model

  • criterion – criterion function

  • optimizer – optimizer

  • scheduler – scheduler

  • distributed_params (dict, optional) – dict with the parameters for distributed and FP16 method

  • device (Device, optional) – device

Returns

tuple with processed model, criterion, optimizer, scheduler and device.

Raises
  • ValueError – if device is None and TPU available, for using TPU need to manualy move model/optimizer/scheduler to a TPU device and pass device to a function.

  • NotImplementedError – if model is not nn.Module or dict for multi-gpu, nn.ModuleDict for DataParallel not implemented yet

Config

catalyst.utils.config.load_config(path: Union[str, pathlib.Path], ordered: bool = False, data_format: str = None, encoding: str = 'utf-8') → Union[Dict, List][source]

Loads config by giving path. Supports YAML and JSON files.

Examples

>>> load(path="./config.yml", ordered=True)
Parameters
  • path – path to config file (YAML or JSON)

  • ordered – if true the config will be loaded as OrderedDict

  • data_formatyaml, yml or json.

  • encoding – encoding to read the config

Returns

config

Return type

Union[Dict, List]

Raises

Exception – if path path doesn’t exists or file format is not YAML or JSON

Adapted from https://github.com/TezRomacH/safitty/blob/v1.2.0/safitty/parser.py#L63 which was adapted from https://github.com/catalyst-team/catalyst/blob/v19.03/catalyst/utils/config.py#L10

catalyst.utils.config.save_config(config: Union[Dict, List], path: Union[str, pathlib.Path], data_format: str = None, encoding: str = 'utf-8', ensure_ascii: bool = False, indent: int = 2) → None[source]

Saves config to file. Path must be either YAML or JSON.

Parameters
  • config (Union[Dict, List]) – config to save

  • path (Union[str, Path]) – path to save

  • data_formatyaml, yml or json.

  • encoding – Encoding to write file. Default is utf-8

  • ensure_ascii – Used for JSON, if True non-ASCII

  • are escaped in JSON strings. (characters) –

  • indent – Used for JSON

Adapted from https://github.com/TezRomacH/safitty/blob/v1.2.0/safitty/parser.py#L110 which was adapted from https://github.com/catalyst-team/catalyst/blob/v19.03/catalyst/utils/config.py#L38

Distributed

catalyst.utils.distributed.check_ddp_wrapped(model: torch.nn.modules.module.Module) → bool[source]

Checks whether model is wrapped with DataParallel/DistributedDataParallel.

catalyst.utils.distributed.check_apex_available() → bool[source]

Checks if apex is available.

catalyst.utils.distributed.check_amp_available() → bool[source]

Checks if torch.amp is available.

catalyst.utils.distributed.check_torch_distributed_initialized() → bool[source]

Checks if torch.distributed is available and initialized.

catalyst.utils.distributed.check_slurm_available()[source]

Checks if slurm is available.

catalyst.utils.distributed.assert_fp16_available() → None[source]

Asserts for installed and available Apex FP16.

catalyst.utils.distributed.initialize_apex(model, optimizer=None, **distributed_params)[source]

Prepares model and optimizer for work with Nvidia Apex.

Parameters
  • model – torch model

  • optimizer – torch optimizer

  • **distributed_params – extra params for apex.amp.initialize

Returns

model and optimiezer, wrapped with Nvidia Apex initialization

catalyst.utils.distributed.get_nn_from_ddp_module(model: torch.nn.modules.module.Module) → torch.nn.modules.module.Module[source]

Return a real model from a torch.nn.DataParallel, torch.nn.parallel.DistributedDataParallel, or apex.parallel.DistributedDataParallel.

Parameters

model – A model, or DataParallel wrapper.

Returns

A model

catalyst.utils.distributed.get_rank() → int[source]

Returns the rank of the current worker.

Returns

rank if torch.distributed is initialized, otherwise -1

Return type

int

catalyst.utils.distributed.get_distributed_mean(value: Union[float, torch.Tensor])[source]

Computes distributed mean among all nodes.

catalyst.utils.distributed.get_distributed_env(local_rank: int, rank: int, world_size: int, use_cuda_visible_devices: bool = True)[source]

Returns environment copy with extra distributed settings.

Parameters
  • local_rank – worker local rank

  • rank – worker global rank

  • world_size – worker world size

  • use_cuda_visible_devices – boolean flag to use available GPU devices

Returns

updated environment copy

catalyst.utils.distributed.get_distributed_params()[source]

Returns distributed params for experiment run.

Returns

dictionary with distributed params

catalyst.utils.distributed.get_slurm_params()[source]

Return slurm params for experiment run.

Returns

tuple with current node index, number of nodes, master node

and master port

Loaders

catalyst.utils.loaders.get_native_batch_from_loader(loader: torch.utils.data.dataloader.DataLoader, batch_index: int = 0)[source]

Returns a batch from experiment loader

Parameters
  • loader – Loader to get batch from

  • batch_index – Index of batch to take from dataset of the loader

Returns

batch from loader

catalyst.utils.loaders.get_native_batch_from_loaders(loaders: Dict[str, torch.utils.data.dataloader.DataLoader], loader: Union[str, int] = 0, batch_index: int = 0)[source]

Returns a batch from experiment loaders by its index or name.

Parameters
  • loaders (Dict[str, DataLoader]) – Loaders list to get loader from

  • loader (Union[str, int]) – Loader name or its index, default is zero

  • batch_index – Index of batch to take from dataset of the loader

Returns

batch from loader

Raises

TypeError – if loader parameter is not a string or an integer

catalyst.utils.loaders.get_loader(data_source: Iterable[dict], open_fn: Callable, dict_transform: Callable = None, sampler=None, collate_fn: Callable = <function default_collate>, batch_size: int = 32, num_workers: int = 4, shuffle: bool = False, drop_last: bool = False)[source]

Creates a DataLoader from given source and its open/transform params.

Parameters
  • data_source – and iterable containing your data annotations, (for example path to images, labels, bboxes, etc)

  • open_fn – function, that can open your annotations dict and transfer it to data, needed by your network (for example open image by path, or tokenize read string)

  • dict_transform – transforms to use on dict (for example normalize image, add blur, crop/resize/etc)

  • sampler (Sampler, optional) – defines the strategy to draw samples from the dataset

  • collate_fn (callable, optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset

  • batch_size (int, optional) – how many samples per batch to load

  • num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process

  • shuffle (bool, optional) – set to True to have the data reshuffled at every epoch (default: False).

  • drop_last (bool, optional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)

Returns

DataLoader with catalyst.data.ListDataset

catalyst.utils.loaders.validate_loaders(loaders: Dict[str, torch.utils.data.dataloader.DataLoader]) → Dict[str, torch.utils.data.dataloader.DataLoader][source]

Check pytorch dataloaders for distributed setup. Transfers them to distirbuted mode if necessary. (Experimental feature)

Parameters

loaders (Dict[str, DataLoader]) – dictionery with pytorch dataloaders

Returns

dictionery

with pytorch dataloaders (with distributed samplers if necessary)

Return type

Dict[str, DataLoader]

catalyst.utils.loaders.get_loaders_from_params(batch_size: int = 1, num_workers: int = 0, drop_last: bool = False, per_gpu_scaling: bool = False, loaders_params: Dict[str, Any] = None, samplers_params: Dict[str, Any] = None, initial_seed: int = 42, get_datasets_fn: Callable = None, **data_params) → OrderedDict[str, DataLoader][source]

Creates pytorch dataloaders from datasets and additional parameters.

Parameters
  • batch_sizebatch_size parameter from torch.utils.data.DataLoader

  • num_workersnum_workers parameter from torch.utils.data.DataLoader

  • drop_lastdrop_last parameter from torch.utils.data.DataLoader

  • per_gpu_scaling – boolean flag, if True, scales batch_size in proportion to the number of GPUs

  • loaders_params (Dict[str, Any]) – additional loaders parameters

  • samplers_params (Dict[str, Any]) – additional sampler parameters

  • initial_seed – initial seed for torch.utils.data.DataLoader workers

  • get_datasets_fn (Callable) – callable function to get dictionary with torch.utils.data.Datasets

  • **data_params – additional data parameters or dictionary with torch.utils.data.Datasets to use for pytorch dataloaders creation

Returns

dictionary with

torch.utils.data.DataLoader

Return type

OrderedDict[str, DataLoader]

Raises
  • NotImplementedError – if datasource is out of Dataset or dict

  • ValueError – if batch_sampler option is mutually exclusive with distributed

catalyst.utils.loaders.validate_loaders(loaders: Dict[str, torch.utils.data.dataloader.DataLoader]) → Dict[str, torch.utils.data.dataloader.DataLoader][source]

Check pytorch dataloaders for distributed setup. Transfers them to distirbuted mode if necessary. (Experimental feature)

Parameters

loaders (Dict[str, DataLoader]) – dictionery with pytorch dataloaders

Returns

dictionery

with pytorch dataloaders (with distributed samplers if necessary)

Return type

Dict[str, DataLoader]

Misc

catalyst.utils.misc.boolean_flag(parser: argparse.ArgumentParser, name: str, default: Optional[bool] = False, help: str = None, shorthand: str = None) → None[source]

Add a boolean flag to a parser inplace.

Examples

>>> parser = argparse.ArgumentParser()
>>> boolean_flag(
>>>     parser, "flag", default=False, help="some flag", shorthand="f"
>>> )
Parameters
  • parser – parser to add the flag to

  • name – argument name –<name> will enable the flag, while –no-<name> will disable it

  • default (bool, optional) – default value of the flag

  • help – help string for the flag

  • shorthand – shorthand string for the argument

catalyst.utils.misc.copy_directory(input_dir: pathlib.Path, output_dir: pathlib.Path) → None[source]

Recursively copies the input directory.

Parameters
  • input_dir – input directory

  • output_dir – output directory

catalyst.utils.misc.format_metric(name: str, value: float) → str[source]

Format metric.

Metric will be returned in the scientific format if 4 decimal chars are not enough (metric value lower than 1e-4).

Parameters
  • name – metric name

  • value – value of metric

Returns

formatted metric

Return type

str

catalyst.utils.misc.get_fn_default_params(fn: Callable[[...], Any], exclude: List[str] = None)[source]

Return default parameters of Callable.

Parameters
  • fn (Callable[.., Any]) – target Callable

  • exclude – exclude list of parameters

Returns

contains default parameters of fn

Return type

dict

catalyst.utils.misc.get_fn_argsnames(fn: Callable[[...], Any], exclude: List[str] = None)[source]

Return parameter names of Callable.

Parameters
  • fn (Callable[.., Any]) – target Callable

  • exclude – exclude list of parameters

Returns

contains parameter names of fn

Return type

list

catalyst.utils.misc.get_utcnow_time(format: str = None) → str[source]

Return string with current utc time in chosen format.

Parameters

format – format string. if None “%y%m%d.%H%M%S” will be used.

Returns

formatted utc time string

Return type

str

catalyst.utils.misc.is_exception(ex: Any) → bool[source]

Check if the argument is of Exception type.

catalyst.utils.misc.maybe_recursive_call(object_or_dict, method: Union[str, Callable], recursive_args=None, recursive_kwargs=None, **kwargs)[source]

Calls the method recursively for the object_or_dict.

Parameters
  • object_or_dict – some object or a dictionary of objects

  • method – method name to call

  • recursive_args – list of arguments to pass to the method

  • recursive_kwargs – list of key-arguments to pass to the method

  • **kwargs – Arbitrary keyword arguments

Returns

result of method call

catalyst.utils.misc.get_attr(obj: Any, key: str, inner_key: str = None) → Any[source]

Alias for python getattr method. Useful for Callbacks preparation and cases with multi-criterion, multi-optimizer setup. For example, when you would like to train multi-task classification.

Used to get a named attribute from a IRunner by key keyword; for example

get_attr(runner, "criterion")
# is equivalent to
runner.criterion

get_attr(runner, "optimizer")
# is equivalent to
runner.optimizer

get_attr(runner, "scheduler")
# is equivalent to
runner.scheduler

With inner_key usage, it suppose to find a dictionary under key and would get inner_key from this dict; for example,

get_attr(runner, "criterion", "bce")
# is equivalent to
runner.criterion["bce"]

get_attr(runner, "optimizer", "adam")
# is equivalent to
runner.optimizer["adam"]

get_attr(runner, "scheduler", "adam")
# is equivalent to
runner.scheduler["adam"]
Parameters
  • obj – object of interest

  • key – name for attribute of interest, like criterion, optimizer, scheduler

  • inner_key – name of inner dictionary key

Returns

inner attribute

catalyst.utils.misc.set_global_seed(seed: int) → None[source]

Sets random seed into Numpy and Random, PyTorch and TensorFlow.

Parameters

seed – random seed

catalyst.utils.misc.get_dictkey_auto_fn(key: Union[str, List[str], None]) → Callable[source]

Function generator for sub-dict preparation from dict based on predefined keys.

Parameters

key – keys

Returns

function

Raises

NotImplementedError – if key is out of str, tuple, list, dict, None

catalyst.utils.misc.merge_dicts(*dicts: dict) → dict[source]

Recursive dict merge. Instead of updating only top-level keys, merge_dicts recurses down into dicts nested to an arbitrary depth, updating keys.

Parameters

*dicts – several dictionaries to merge

Returns

deep-merged dictionary

Return type

dict

catalyst.utils.misc.flatten_dict(dictionary: Dict[str, Any], parent_key: str = '', separator: str = '/') → collections.OrderedDict[source]

Make the given dictionary flatten.

Parameters
  • dictionary – giving dictionary

  • parent_key (str, optional) – prefix nested keys with string parent_key

  • separator (str, optional) – delimiter between parent_key and key to use

Returns

ordered dictionary with flatten keys

Return type

collections.OrderedDict

catalyst.utils.misc.split_dict_to_subdicts(dct: Dict, prefixes: List, extra_key: str) → Dict[source]

Splits dict into subdicts with spesicied prefixes. Keys, which don’t startswith one of the prefixes go to extra_key.

Examples

>>> dct = {"train_v1": 1, "train_v2": 2, "not_train": 3}
>>> split_dict_to_subdicts(dct, prefixes=["train"], extra_key="_extra")
>>> {"train": {"v1": 1, "v2": 2}, "_extra": {"not_train": 3}}
Parameters
  • dct – dictionary with keys with prefixes

  • prefixes – prefixes of interest, which we would like to reveal

  • extra_key – extra key to store everything else

Returns

dictionary with subdictionaries with prefixes and extra_key keys

catalyst.utils.misc.get_hash(obj: Any) → str[source]

Creates unique hash from object following way: - Represent obj as sting recursively - Hash this string with sha256 hash function - encode hash with url-safe base64 encoding

Parameters

obj – object to hash

Returns

base64-encoded string

catalyst.utils.misc.get_short_hash(obj) → str[source]

Creates unique short hash from object.

Parameters

obj – object to hash

Returns

short base64-encoded string (6 chars)

catalyst.utils.misc.args_are_not_none(*args: Optional[Any]) → bool[source]

Check that all arguments are not None.

Parameters

*args – values # noqa: RST213

Returns

True if all value were not None, False otherwise

Return type

bool

catalyst.utils.misc.make_tuple(tuple_like)[source]

Creates a tuple if given tuple_like value isn’t list or tuple.

Parameters

tuple_like – tuple like object - list or tuple

Returns

tuple or list

catalyst.utils.misc.pairwise(iterable: Iterable[Any]) → Iterable[Any][source]

Iterate sequences by pairs.

Examples

>>> for i in pairwise([1, 2, 5, -3]):
>>>     print(i)
(1, 2)
(2, 5)
(5, -3)
Parameters

iterable – Any iterable sequence

Returns

pairwise iterator

catalyst.utils.misc.find_value_ids(it: Iterable[Any], value: Any) → List[int][source]
Parameters
  • it – list of any

  • value – query element

Returns

indices of the all elements equal x0

Numpy

catalyst.utils.numpy.get_one_hot(label: int, num_classes: int, smoothing: float = None) → numpy.ndarray[source]

Applies OneHot vectorization to a giving scalar, optional with label smoothing as described in Bag of Tricks for Image Classification with Convolutional Neural Networks.

Parameters
  • label – scalar value to be vectorized

  • num_classes – total number of classes

  • smoothing (float, optional) – if specified applies label smoothing from Bag of Tricks for Image Classification with Convolutional Neural Networks paper

Returns

a one-hot vector with shape (num_classes,)

Return type

np.ndarray

Parser

catalyst.utils.parser.parse_config_args(*, config, args, unknown_args)[source]

Parse config and cli args.

Parameters
  • config – dict-based experiment config

  • args – cli args

  • unknown_args – cli unknown args

Returns

final experiment config and cli args

Return type

config, args

catalyst.utils.parser.parse_args_uargs(args, unknown_args)[source]

Function for parsing configuration files.

Parameters
  • args – recognized arguments

  • unknown_args – unrecognized arguments

Returns

updated arguments, dict with config

Return type

tuple

Pruning

catalyst.utils.pruning.prune_model(model: torch.nn.modules.module.Module, pruning_fn: Callable, keys_to_prune: List[str], amount: Union[float, int], layers_to_prune: Optional[List[str]] = None, reinitialize_after_pruning: Optional[bool] = False) → None[source]

Prune model function can be used for pruning certain tensors in model layers.

Raises
  • AttributeError – If layers_to_prune is not None, but there is no layers with specified name.

  • Exception – If no layers have specified keys.

Parameters
  • model – Model to be pruned.

  • pruning_fn – Pruning function with API same as in torch.nn.utils.pruning. pruning_fn(module, name, amount).

  • keys_to_prune – list of strings. Determines which tensor in modules will be pruned.

  • amount – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.

  • layers_to_prune – list of strings - module names to be pruned. If None provided then will try to prune every module in model.

  • reinitialize_after_pruning – if True then will reinitialize model after pruning. (Lottery Ticket Hypothesis check e.g.)

catalyst.utils.pruning.remove_reparametrization(model: torch.nn.modules.module.Module, keys_to_prune: List[str], layers_to_prune: Optional[List[str]] = None) → None[source]

Removes pre-hooks and pruning masks from the model.

Parameters
  • model – model to remove reparametrization.

  • keys_to_prune – list of strings. Determines which tensor in modules have already been pruned.

  • layers_to_prune – list of strings - module names have already been pruned. If None provided then will try to prune every module in model.

Quantization

catalyst.utils.quantization.quantize_model_from_checkpoint(logdir: pathlib.Path, checkpoint_name: str, stage: str = None, qconfig_spec: Union[Set, Dict, None] = None, dtype: Optional[torch.dtype] = torch.qint8, backend: str = None) → torch.nn.modules.module.Module[source]

Quantize model using created experiment and runner.

Parameters
  • logdir (Union[str, Path]) – Path to Catalyst logdir with model

  • checkpoint_name – Name of model checkpoint to use

  • stage – experiment’s stage name

  • qconfig_spec – torch.quantization.quantize_dynamic parameter, you can define layers to be quantize

  • dtype – type of the model parameters, default int8

  • backend – defines backend for quantization

Returns

Quantized model

catalyst.utils.quantization.save_quantized_model(model: torch.nn.modules.module.Module, logdir: Union[str, pathlib.Path] = None, checkpoint_name: str = None, out_dir: Union[str, pathlib.Path] = None, out_model: Union[str, pathlib.Path] = None) → None[source]

Saves quantized model.

Parameters
  • model – Traced model

  • logdir (Union[str, Path]) – Path to experiment

  • checkpoint_name – name for the checkpoint

  • out_dir (Union[str, Path]) – Directory to save model to (overrides logdir)

  • out_model (Union[str, Path]) – Path to save model to (overrides logdir & out_dir)

Raises

ValueError – if nothing out of logdir, out_dir or out_model is specified.

Scripts

catalyst.utils.scripts.import_module(expdir: Union[str, pathlib.Path])[source]

Imports python module by path.

Parameters

expdir – path to python module.

Returns

Imported module.

catalyst.utils.scripts.dump_code(expdir: Union[str, pathlib.Path], logdir: Union[str, pathlib.Path]) → None[source]

Dumps Catalyst code for reproducibility.

Parameters
  • expdir (Union[str, pathlib.Path]) – experiment dir path

  • logdir (Union[str, pathlib.Path]) – logging dir path

catalyst.utils.scripts.dump_python_files(src: pathlib.Path, dst: pathlib.Path) → None[source]

Dumps python code (*.py and *.ipynb) files.

Parameters
  • src – source code path

  • dst – destination code path

catalyst.utils.scripts.prepare_config_api_components(expdir: pathlib.Path, config: Dict)[source]

Imports and create core Config API components - Experiment, Runner and Config from expdir - experiment directory and config - experiment config.

Parameters
  • expdir – experiment directory path

  • config – dictionary with experiment Config

Returns

Experiment, Runner, Config for Config API usage.

catalyst.utils.scripts.dump_experiment_code(src: pathlib.Path, dst: pathlib.Path) → None[source]

Dumps your experiment code for Config API use cases.

Parameters
  • src – source code path

  • dst – destination code path

catalyst.utils.scripts.distributed_cmd_run(worker_fn: Callable, distributed: bool = True, *args, **kwargs) → None[source]

Distributed run

Parameters
  • worker_fn – worker fn to run in distributed mode

  • distributed – distributed flag

  • args – additional parameters for worker_fn

  • kwargs – additional key-value parameters for worker_fn

Stochastic Weights Averaging (SWA)

catalyst.utils.swa.average_weights(state_dicts: List[dict]) → collections.OrderedDict[source]

Averaging of input weights.

Parameters

state_dicts – Weights to average

Raises

KeyError – If states do not match

Returns

Averaged weights

catalyst.utils.swa.get_averaged_weights_by_path_mask(path_mask: str, logdir: Union[str, pathlib.Path] = None) → collections.OrderedDict[source]

Averaging of input weights and saving them.

Parameters
  • path_mask – globe-like pattern for models to average

  • logdir – Path to logs directory

Returns

Averaged weights

Sys

catalyst.utils.sys.get_environment_vars() → Dict[str, Any][source]

Creates a dictionary with environment variables.

Returns

environment variables

Return type

Dict

catalyst.utils.sys.list_conda_packages() → str[source]

Lists conda installed packages.

Returns

list with conda installed packages

Return type

str

catalyst.utils.sys.list_pip_packages() → str[source]

Lists pip installed packages.

Returns

string with pip installed packages

Return type

str

catalyst.utils.sys.dump_environment(experiment_config: Any, logdir: str, configs_path: List[str] = None) → None[source]

Saves config, environment variables and package list in JSON into logdir.

Parameters
  • experiment_config – experiment config

  • logdir – path to logdir

  • configs_path – path(s) to config

Torch

catalyst.utils.torch.get_optimizable_params(model_or_params)[source]

Returns all the parameters that requires gradients.

catalyst.utils.torch.get_optimizer_momentum(optimizer: torch.optim.optimizer.Optimizer) → float[source]

Get momentum of current optimizer.

Parameters

optimizer – PyTorch optimizer

Returns

momentum at first param group

Return type

float

catalyst.utils.torch.get_optimizer_momentum_list(optimizer: torch.optim.optimizer.Optimizer) → List[Optional[float]][source]

Get list of optimizer momentums (for each param group)

Parameters

optimizer – PyTorch optimizer

Returns

momentum for each param group

Return type

momentum_list (List[Union[float, None]])

catalyst.utils.torch.set_optimizer_momentum(optimizer: torch.optim.optimizer.Optimizer, value: float, index: int = 0)[source]

Set momentum of index ‘th param group of optimizer to value.

Parameters
  • optimizer – PyTorch optimizer

  • value – new value of momentum

  • index (int, optional) – integer index of optimizer’s param groups, default is 0

catalyst.utils.torch.get_device() → torch.device[source]

Simple returning the best available device (TPU > GPU > CPU).

catalyst.utils.torch.get_available_gpus()[source]

Array of available GPU ids.

Examples

>>> os.environ["CUDA_VISIBLE_DEVICES"] = "0,2"
>>> get_available_gpus()
[0, 2]
>>> os.environ["CUDA_VISIBLE_DEVICES"] = "0,-1,1"
>>> get_available_gpus()
[0]
>>> os.environ["CUDA_VISIBLE_DEVICES"] = ""
>>> get_available_gpus()
[]
>>> os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
>>> get_available_gpus()
[]
Returns

available GPU ids

Return type

iterable

catalyst.utils.torch.get_activation_fn(activation: str = None)[source]

Returns the activation function from torch.nn by its name.

catalyst.utils.torch.any2device(value, device: Union[str, torch.device])[source]

Move tensor, list of tensors, list of list of tensors, dict of tensors, tuple of tensors to target device.

Parameters
  • value – Object to be moved

  • device – target device ids

Returns

Same structure as value, but all tensors and np.arrays moved to device

catalyst.utils.torch.prepare_cudnn(deterministic: bool = None, benchmark: bool = None) → None[source]

Prepares CuDNN benchmark and sets CuDNN to be deterministic/non-deterministic mode

Parameters
  • deterministic – deterministic mode if running in CuDNN backend.

  • benchmark – If True use CuDNN heuristics to figure out which algorithm will be most performant for your model architecture and input. Setting it to False may slow down your training.

catalyst.utils.torch.process_model_params(model: torch.nn.modules.module.Module, layerwise_params: Dict[str, dict] = None, no_bias_weight_decay: bool = True, lr_scaling: float = 1.0) → List[Union[torch.nn.parameter.Parameter, dict]][source]

Gains model parameters for torch.optim.Optimizer.

Parameters
  • model – Model to process

  • layerwise_params – Order-sensitive dict where each key is regex pattern and values are layer-wise options for layers matching with a pattern

  • no_bias_weight_decay – If true, removes weight_decay for all bias parameters in the model

  • lr_scaling – layer-wise learning rate scaling, if 1.0, learning rates will not be scaled

Returns

parameters for an optimizer

Return type

iterable

Example:

>>> model = catalyst.contrib.models.segmentation.ResnetUnet()
>>> layerwise_params = collections.OrderedDict([
>>>     ("conv1.*", dict(lr=0.001, weight_decay=0.0003)),
>>>     ("conv.*", dict(lr=0.002))
>>> ])
>>> params = process_model_params(model, layerwise_params)
>>> optimizer = torch.optim.Adam(params, lr=0.0003)
catalyst.utils.torch.get_requires_grad(model: torch.nn.modules.module.Module)[source]

Gets the requires_grad value for all model parameters.

Example:

>>> model = SimpleModel()
>>> requires_grad = get_requires_grad(model)
Parameters

model – model

Returns

value

Return type

requires_grad (Dict[str, bool])

catalyst.utils.torch.set_requires_grad(model: torch.nn.modules.module.Module, requires_grad: Union[bool, Dict[str, bool]])[source]

Sets the requires_grad value for all model parameters.

Example:

>>> model = SimpleModel()
>>> set_requires_grad(model, requires_grad=True)
>>> # or
>>> model = SimpleModel()
>>> set_requires_grad(model, requires_grad={""})
Parameters
  • model – model

  • requires_grad (Union[bool, Dict[str, bool]]) – value

catalyst.utils.torch.get_network_output(net: torch.nn.modules.module.Module, *input_shapes_args, **input_shapes_kwargs)[source]

# noqa: D202 For each input shape returns an output tensor

Examples

>>> net = nn.Linear(10, 5)
>>> utils.get_network_output(net, (1, 10))
tensor([[[-0.2665,  0.5792,  0.9757, -0.5782,  0.1530]]])
Parameters
  • net – the model

  • *input_shapes_args – variable length argument list of shapes

  • **input_shapes_kwargs – key-value arguemnts of shapes

Returns

tensor with network output

catalyst.utils.torch.detach(tensor: torch.Tensor) → numpy.ndarray[source]

Detach a pytorch tensor from graph and convert it to numpy array

Parameters

tensor – PyTorch tensor

Returns

numpy ndarray

catalyst.utils.torch.trim_tensors(tensors)[source]

Trim padding off of a batch of tensors to the smallest possible length. Should be used with catalyst.data.DynamicLenBatchSampler.

Adapted from Dynamic minibatch trimming to improve BERT training speed.

Parameters

tensors – list of tensors to trim.

Returns

list of trimmed tensors.

Return type

List[torch.tensor]

catalyst.utils.torch.normalize(samples: torch.Tensor) → torch.Tensor[source]
Parameters

samples – tensor with shape of [n_samples, features_dim]

Returns

normalized tensor with the same shape

catalyst.utils.torch.convert_labels2list(labels: Union[torch.Tensor, List[int]]) → List[int][source]

This function allows to work with 2 types of indexing: using a integer tensor and a list of indices.

Parameters

labels – labels of batch samples

Returns

labels of batch samples in the aligned format

Raises

TypeError – if type of input labels is not tensor and list

catalyst.utils.torch.get_optimal_inner_init(nonlinearity: torch.nn.modules.module.Module, **kwargs) → Callable[[torch.nn.modules.module.Module], None][source]

Create initializer for inner layers based on their activation function (nonlinearity).

Parameters
  • nonlinearity – non-linear activation

  • **kwargs – extra kwargs

Returns

optimal initialization function

Raises

NotImplementedError – if nonlinearity is out of sigmoid, tanh, relu, `leaky_relu

catalyst.utils.torch.outer_init(layer: torch.nn.modules.module.Module) → None[source]

Initialization for output layers of policy and value networks typically used in deep reinforcement learning literature.

Parameters

layer – torch nn.Module instance

catalyst.utils.torch.reset_weights_if_possible(module: torch.nn.modules.module.Module)[source]

Resets module parameters if possible.

Parameters

module – Module to reset.

Tracing

catalyst.utils.tracing.trace_model(model: torch.nn.modules.module.Module, predict_fn: Callable, batch=None, method_name: str = 'forward', mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, device: Union[str, torch.device] = 'cpu', predict_params: dict = None) → torch.jit._script.ScriptModule[source]

Traces model using runner and batch.

Parameters
  • model – Model to trace

  • predict_fn – Function to run prediction with the model provided, takes model, inputs parameters

  • batch – Batch to trace the model

  • method_name – Model’s method name that will be used as entrypoint during tracing

  • mode – Mode for model to trace (train or eval)

  • requires_grad – Flag to use grads

  • opt_level – Apex FP16 init level, optional

  • device – Torch device

  • predict_params – additional parameters for model forward

Returns

Traced model

Return type

jit.ScriptModule

Raises

ValueError – if both batch and predict_fn must be specified or mode is not in ‘eval’ or ‘train’.

catalyst.utils.tracing.trace_model_from_checkpoint(logdir: pathlib.Path, method_name: str, checkpoint_name: str, stage: str = None, loader: Union[str, int] = None, mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, device: Union[str, torch.device] = 'cpu')[source]

Traces model using created experiment and runner.

Parameters
  • logdir (Union[str, Path]) – Path to Catalyst logdir with model

  • checkpoint_name – Name of model checkpoint to use

  • stage – experiment’s stage name

  • loader (Union[str, int]) – experiment’s loader name or its index

  • method_name – Model’s method name that will be used as entrypoint during tracing

  • mode – Mode for model to trace (train or eval)

  • requires_grad – Flag to use grads

  • opt_level – AMP FP16 init level

  • device – Torch device

Returns

the traced model

catalyst.utils.tracing.trace_model_from_runner(runner: IRunner, checkpoint_name: str = None, method_name: str = 'forward', mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, device: Union[str, torch.device] = 'cpu') → torch.jit._script.ScriptModule[source]

Traces model using created experiment and runner.

Parameters
  • runner – current runner.

  • checkpoint_name – Name of model checkpoint to use, if None traces current model from runner

  • method_name – Model’s method name that will be used as entrypoint during tracing

  • mode – Mode for model to trace (train or eval)

  • requires_grad – Flag to use grads

  • opt_level – AMP FP16 init level

  • device – Torch device

Returns

Traced model

Return type

ScriptModule

catalyst.utils.tracing.get_trace_name(method_name: str, mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, additional_string: str = None) → str[source]

Creates a file name for the traced model.

Parameters
  • method_name – model’s method name

  • modetrain or eval

  • requires_grad – flag if model was traced with gradients

  • opt_level – opt_level if model was traced in FP16

  • additional_string – any additional information

Returns

Filename for traced model to be saved.

Return type

str

catalyst.utils.tracing.save_traced_model(model: torch.jit._script.ScriptModule, logdir: Union[str, pathlib.Path] = None, method_name: str = 'forward', mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, out_dir: Union[str, pathlib.Path] = None, out_model: Union[str, pathlib.Path] = None, checkpoint_name: str = None) → None[source]

Saves traced model.

Parameters
  • model – Traced model

  • logdir (Union[str, Path]) – Path to experiment

  • method_name – Name of the method was traced

  • mode – Model’s mode - train or eval

  • requires_grad – Whether model was traced with require_grad or not

  • opt_level – Apex FP16 init level used during tracing

  • out_dir (Union[str, Path]) – Directory to save model to (overrides logdir)

  • out_model (Union[str, Path]) – Path to save model to (overrides logdir & out_dir)

  • checkpoint_name – Checkpoint name used to restore the model

Raises

ValueError – if nothing out of logdir, out_dir or out_model is specified.

catalyst.utils.tracing.load_traced_model(model_path: Union[str, pathlib.Path], device: Union[str, torch.device] = 'cpu', opt_level: str = None) → torch.jit._script.ScriptModule[source]

Loads a traced model.

Parameters
  • model_path – Path to traced model

  • device – Torch device

  • opt_level – Apex FP16 init level, optional

Returns

Traced model

Return type

ScriptModule

Contrib

Compression

catalyst.contrib.utils.compression.pack(data)

Serialize the data into bytes using pickle.

Parameters

data – a value

Returns

Returns a bytes object serialized with pickle data.

catalyst.contrib.utils.compression.pack_if_needed(data)

Serialize the data into bytes using pickle.

Parameters

data – a value

Returns

Returns a bytes object serialized with pickle data.

catalyst.contrib.utils.compression.unpack(bytes)

Deserialize bytes into an object using pickle.

Parameters

bytes – a bytes object containing serialized with pickle data.

Returns

Returns a value deserialized from the bytes-like object.

catalyst.contrib.utils.compression.unpack_if_needed(bytes)

Deserialize bytes into an object using pickle.

Parameters

bytes – a bytes object containing serialized with pickle data.

Returns

Returns a value deserialized from the bytes-like object.

Pandas

catalyst.contrib.utils.pandas.dataframe_to_list(dataframe: pandas.core.frame.DataFrame) → List[dict][source]

Converts dataframe to a list of rows (without indexes).

Parameters

dataframe – input dataframe

Returns

list of rows

Return type

List[dict]

catalyst.contrib.utils.pandas.folds_to_list(folds: Union[list, str, pandas.core.series.Series]) → List[int][source]

This function formats string or either list of numbers into a list of unique int.

Examples

>>> folds_to_list("1,2,1,3,4,2,4,6")
[1, 2, 3, 4, 6]
>>> folds_to_list([1, 2, 3.0, 5])
[1, 2, 3, 5]
Parameters

folds (Union[list, str, pd.Series]) – Either list of numbers or one string with numbers separated by commas or pandas series

Returns

list of unique ints

Return type

List[int]

Raises

ValueError – if value in string or array cannot be casted to int

catalyst.contrib.utils.pandas.split_dataframe(dataframe: pandas.core.frame.DataFrame, train_folds: List[int], valid_folds: Optional[List[int]] = None, infer_folds: Optional[List[int]] = None, tag2class: Optional[Dict[str, int]] = None, tag_column: str = None, class_column: str = None, seed: int = 42, n_folds: int = 5) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]

Split a Pandas DataFrame into folds.

Parameters
  • dataframe – input dataframe

  • train_folds – train folds

  • valid_folds (List[int], optional) – valid folds. If none takes all folds not included in train_folds

  • infer_folds (List[int], optional) – infer folds. If none takes all folds not included in train_folds and valid_folds

  • tag2class (Dict[str, int], optional) – mapping from label names into int

  • tag_column (str, optional) – column with label names

  • class_column (str, optional) – column to use for split

  • seed – seed for split

  • n_folds – number of folds

Returns

tuple with 4 dataframes

whole dataframe, train part, valid part and infer part

Return type

tuple

catalyst.contrib.utils.pandas.split_dataframe_on_column_folds(dataframe: pandas.core.frame.DataFrame, column: str, random_state: int = 42, n_folds: int = 5) → pandas.core.frame.DataFrame[source]

Splits DataFrame into N folds.

Parameters
  • dataframe – a dataset

  • column – which column to use

  • random_state – seed for random shuffle

  • n_folds – number of result folds

Returns

new dataframe with fold column

Return type

pd.DataFrame

catalyst.contrib.utils.pandas.split_dataframe_on_folds(dataframe: pandas.core.frame.DataFrame, random_state: int = 42, n_folds: int = 5) → pandas.core.frame.DataFrame[source]

Splits DataFrame into N folds.

Parameters
  • dataframe – a dataset

  • random_state – seed for random shuffle

  • n_folds – number of result folds

Returns

new dataframe with fold column

Return type

pd.DataFrame

catalyst.contrib.utils.pandas.split_dataframe_on_stratified_folds(dataframe: pandas.core.frame.DataFrame, class_column: str, random_state: int = 42, n_folds: int = 5) → pandas.core.frame.DataFrame[source]

Splits DataFrame into N stratified folds.

Also see catalyst.data.sampler.BalanceClassSampler

Parameters
  • dataframe – a dataset

  • class_column – which column to use for split

  • random_state – seed for random shuffle

  • n_folds – number of result folds

Returns

new dataframe with fold column

Return type

pd.DataFrame

catalyst.contrib.utils.pandas.split_dataframe_train_test(dataframe: pandas.core.frame.DataFrame, **train_test_split_args) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]

Split dataframe in train and test part.

Parameters
  • dataframe – pd.DataFrame to split

  • **train_test_split_args

    test_sizefloat, int, or None (default is None)

    If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is automatically set to the complement of the train size. If train size is also None, test size is set to 0.25.

    train_sizefloat, int, or None (default is None)

    If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

    random_stateint or RandomState

    Pseudo-random number generator state used for random sampling.

    stratifyarray-like or None (default is None)

    If not None, data is split in a stratified fashion, using this as the class labels.

Returns

train and test DataFrames

Note

It exist cause sklearn split is overcomplicated.

catalyst.contrib.utils.pandas.separate_tags(dataframe: pandas.core.frame.DataFrame, tag_column: str = 'tag', tag_delim: str = ', ') → pandas.core.frame.DataFrame[source]

Separates values in class_column column.

Parameters
  • dataframe – a dataset

  • tag_column – column name to separate values

  • tag_delim – delimiter to separate values

Returns

new dataframe

Return type

pd.DataFrame

catalyst.contrib.utils.pandas.read_multiple_dataframes(in_csv_train: str = None, in_csv_valid: str = None, in_csv_infer: str = None, tag2class: Optional[Dict[str, int]] = None, class_column: str = None, tag_column: str = None) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]

This function reads train/valid/infer dataframes from giving paths.

Parameters
  • in_csv_train – paths to train csv separated by commas

  • in_csv_valid – paths to valid csv separated by commas

  • in_csv_infer – paths to infer csv separated by commas

  • tag2class (Dict[str, int], optional) – mapping from label names into int

  • tag_column (str, optional) – column with label names

  • class_column (str, optional) – column to use for split

Returns

tuple with 4 dataframes

whole dataframe, train part, valid part and infer part

Return type

tuple

catalyst.contrib.utils.pandas.map_dataframe(dataframe: pandas.core.frame.DataFrame, tag_column: str, class_column: str, tag2class: Dict[str, int], verbose: bool = False) → pandas.core.frame.DataFrame[source]

This function maps tags from tag_column to ints into class_column using tag2class dictionary.

Parameters
  • dataframe – input dataframe

  • tag_column – column with tags

  • class_column (str) –

  • tag2class (Dict[str, int]) – mapping from tags to class labels

  • verbose – flag if true, uses tqdm

Returns

updated dataframe with class_column

Return type

pd.DataFrame

catalyst.contrib.utils.pandas.get_dataset_labeling(dataframe: pandas.core.frame.DataFrame, tag_column: str) → Dict[str, int][source]

Prepares a mapping using unique values from tag_column.

{
    "class_name_0": 0,
    "class_name_1": 1,
    ...
    "class_name_N": N
}
Parameters
  • dataframe – a dataset

  • tag_column – which column to use

Returns

mapping from tag to labels

Return type

Dict[str, int]

catalyst.contrib.utils.pandas.merge_multiple_fold_csv(fold_name: str, paths: Optional[str]) → pandas.core.frame.DataFrame[source]

Reads csv into one DataFrame with column fold.

Parameters
  • fold_name – current fold name

  • paths – paths to csv separated by commas

Returns

merged dataframes with column fold == fold_name

Return type

pd.DataFrame

catalyst.contrib.utils.pandas.read_csv_data(in_csv: str = None, train_folds: Optional[List[int]] = None, valid_folds: Optional[List[int]] = None, infer_folds: Optional[List[int]] = None, seed: int = 42, n_folds: int = 5, in_csv_train: str = None, in_csv_valid: str = None, in_csv_infer: str = None, tag2class: Optional[Dict[str, int]] = None, class_column: str = None, tag_column: str = None) → Tuple[pandas.core.frame.DataFrame, List[dict], List[dict], List[dict]][source]

From giving path in_csv reads a dataframe and split it to train/valid/infer folds or from several paths in_csv_train, in_csv_valid, in_csv_infer reads independent folds.

Note

This function can be used with different combinations of params.
First block is used to get dataset from one csv:

in_csv, train_folds, valid_folds, infer_folds, seed, n_folds

Second includes paths to different csv for train/valid and infer parts:

in_csv_train, in_csv_valid, in_csv_infer

The other params (tag2class, tag_column, class_column) are optional

for any previous block

Parameters
  • in_csv – paths to whole dataset

  • train_folds – train folds

  • valid_folds (List[int], optional) – valid folds. If none takes all folds not included in train_folds

  • infer_folds (List[int], optional) – infer folds. If none takes all folds not included in train_folds and valid_folds

  • seed – seed for split

  • n_folds – number of folds

  • in_csv_train – paths to train csv separated by commas

  • in_csv_valid – paths to valid csv separated by commas

  • in_csv_infer – paths to infer csv separated by commas

  • tag2class (Dict[str, int]) – mapping from label names into ints

  • tag_column – column with label names

  • class_column – column to use for split

Returns

tuple with 4 elements (whole dataframe, list with train data, list with valid data and list with infer data)

Return type

Tuple[pd.DataFrame, List[dict], List[dict], List[dict]]

catalyst.contrib.utils.pandas.balance_classes(dataframe: pandas.core.frame.DataFrame, class_column: str = 'label', random_state: int = 42, how: str = 'downsampling') → pandas.core.frame.DataFrame[source]

Balance classes in dataframe by class_column.

See also catalyst.data.sampler.BalanceClassSampler.

Parameters
  • dataframe – a dataset

  • class_column – which column to use for split

  • random_state – seed for random shuffle

  • how – strategy to sample, must be one on [“downsampling”, “upsampling”]

Returns

new dataframe with balanced class_column

Return type

pd.DataFrame

Raises

NotImplementedError – if how is not in [“upsampling”, “downsampling”, int]

catalyst.contrib.utils.pandas.create_dataset(dirs: str, extension: str = None, process_fn: Callable[[str], object] = None, recursive: bool = False) → Dict[str, object][source]

Create dataset (dict like {key: [values]}) from vctk-like dataset:

dataset/
    cat/
        *.ext
    dog/
        *.ext
Parameters
  • dirs – path to dirs, for example /home/user/data/**

  • extension – data extension you are looking for

  • process_fn (Callable[[str], object]) – function(path_to_file) -> object process function for found files, by default

  • recursive – enables recursive globbing

Returns

dataset

Return type

dict

catalyst.contrib.utils.pandas.create_dataframe(dataset: Dict[str, object], **dataframe_args) → pandas.core.frame.DataFrame[source]

Create pd.DataFrame from dict like {key: [values]}.

Parameters
  • dataset – dict like {key: [values]}

  • **dataframe_args

    indexIndex or array-like

    Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided

    columnsIndex or array-like

    Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided

    dtypedtype, default None

    Data type to force, otherwise infer

Returns

dataframe from giving dataset

Return type

pd.DataFrame

catalyst.contrib.utils.pandas.split_dataset_train_test(dataset: pandas.core.frame.DataFrame, **train_test_split_args) → Tuple[Dict[str, object], Dict[str, object]][source]

Split dataset in train and test parts.

Parameters
  • dataset – dict like dataset

  • **train_test_split_args

    test_sizefloat, int, or None (default is None)

    If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is automatically set to the complement of the train size. If train size is also None, test size is set to 0.25.

    train_sizefloat, int, or None (default is None)

    If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

    random_stateint or RandomState

    Pseudo-random number generator state used for random sampling.

    stratifyarray-like or None (default is None)

    If not None, data is split in a stratified fashion, using this as the class labels.

Returns

train and test dicts

Parallel

catalyst.contrib.utils.parallel.parallel_imap(func, args, pool: Union[multiprocessing.pool.Pool, catalyst.contrib.utils.parallel.DumbPool]) → List[T][source]

@TODO: Docs. Contribution is welcome.

catalyst.contrib.utils.parallel.tqdm_parallel_imap(func, args, pool: Union[multiprocessing.pool.Pool, catalyst.contrib.utils.parallel.DumbPool], total: int = None, pbar=<class 'tqdm.std.tqdm'>) → List[T][source]

@TODO: Docs. Contribution is welcome.

catalyst.contrib.utils.parallel.get_pool(workers: int) → Union[multiprocessing.pool.Pool, catalyst.contrib.utils.parallel.DumbPool][source]

@TODO: Docs. Contribution is welcome.

Plotly

catalyst.contrib.utils.plotly.plot_tensorboard_log(logdir: Union[str, pathlib.Path], step: Optional[str] = 'batch', metrics: Optional[List[str]] = None, height: Optional[int] = None, width: Optional[int] = None) → None[source]

@TODO: Docs. Contribution is welcome.

Adapted from https://github.com/belskikh/kekas/blob/v0.1.23/kekas/utils.py#L193

catalyst.contrib.utils.plotly.plot_metrics(logdir: Union[str, pathlib.Path], step: Optional[str] = 'epoch', metrics: Optional[List[str]] = None, height: Optional[int] = None, width: Optional[int] = None) → None[source]

Plots your learning results.

Parameters
  • logdir – the logdir that was specified during training.

  • step – ‘batch’ or ‘epoch’ - what logs to show: for batches or for epochs

  • metrics – list of metrics to plot. The loss should be specified as ‘loss’, learning rate = ‘_base/lr’ and other metrics should be specified as names in metrics dict that was specified during training

  • height – the height of the whole resulting plot

  • width – the width of the whole resulting plot

Serialization

catalyst.contrib.utils.serialization.serialize(data)

Serialize the data into bytes using pickle.

Parameters

data – a value

Returns

Returns a bytes object serialized with pickle data.

catalyst.contrib.utils.serialization.deserialize(bytes)

Deserialize bytes into an object using pickle.

Parameters

bytes – a bytes object containing serialized with pickle data.

Returns

Returns a value deserialized from the bytes-like object.

Torch

catalyst.contrib.utils.torch_extra.calculate_tp_fp_fn(confusion_matrix: numpy.ndarray) → numpy.ndarray[source]

Calculated TP, FP, FN statistics from confusion matrix.

Parameters

confusion_matrix – confusion matrix, np.ndarray

Returns

dictionary with TP, FP, FN statistics

catalyst.contrib.utils.torch_extra.calculate_confusion_matrix_from_arrays(predictions: numpy.ndarray, labels: numpy.ndarray, num_classes: int) → numpy.ndarray[source]

Calculate confusion matrix for a given set of classes. If labels value is outside of the [0, num_classes) it is excluded.

Parameters
  • predictions – model predictions

  • labels – ground truth labels

  • num_classes – number of classes

Returns

confusion matrix

Return type

np.ndarray

catalyst.contrib.utils.torch_extra.calculate_confusion_matrix_from_tensors(y_pred_logits: torch.Tensor, y_true: torch.Tensor) → numpy.ndarray[source]

Calculate confusion matrix from tensors.

Parameters
  • y_pred_logits – model logits

  • y_true – true labels

Returns

confusion matrix

Return type

np.ndarray

Visualization

catalyst.contrib.utils.visualization.plot_confusion_matrix(cm, class_names=None, normalize=False, title='confusion matrix', fname=None, show=True, figsize=12, fontsize=32, colormap='Blues')[source]

Render the confusion matrix and return matplotlib”s figure with it. Normalization can be applied by setting normalize=True.

catalyst.contrib.utils.visualization.render_figure_to_tensor(figure)[source]

@TODO: Docs. Contribution is welcome.

Wizard

catalyst.contrib.utils.wizard.run_wizard()[source]

Method to initialize and run wizard.

class catalyst.contrib.utils.wizard.Wizard[source]

Bases: object

Class for Catalyst Config API Wizard.

The instance of this class will be created and called from cli command: catalyst-dl init --interactive.

With help of this Wizard user will be able to setup pipeline from available templates and make choices of what predefined classes to use in different parts of pipeline.

__init__()[source]

Initialization of instance of this class will print welcome message and logo of Catalyst in ASCII format. Also here we’ll save all classes of Catalyst own pipeline parts to be able to put user’s modules on top of lists to ease the choice.

run()[source]

Walks user through predefined wizard steps.

catalyst.contrib.utils.wizard.clone_pipeline(template: str, out_dir: pathlib.Path) → None[source]

Clones pipeline from empty pipeline template or from demo pipelines available in Git repos of Catalyst Team.

Parameters
  • template – type of pipeline you want to clone. empty/classification/segmentation

  • out_dir – path where pipeline directory should be cloned

Computer Vision

Image

catalyst.contrib.utils.cv.image.has_image_extension(uri) → bool[source]

Check that file has image extension.

Parameters

uri (Union[str, pathlib.Path]) – the resource to load the file from

Returns

True if file has image extension, False otherwise

Return type

bool

catalyst.contrib.utils.cv.image.imread(uri, grayscale: bool = False, expand_dims: bool = True, rootpath: Union[str, pathlib.Path] = None, **kwargs) → numpy.ndarray[source]

Reads an image from the specified file.

Parameters
  • uri (str, pathlib.Path, bytes, file) – the resource to load the image from, e.g. a filename, pathlib.Path, http address or file object, see imageio.imread docs for more info

  • grayscale – if True, make all images grayscale

  • expand_dims – if True, append channel axis to grayscale images rootpath (Union[str, pathlib.Path]): path to the resource with image (allows to use relative path)

  • rootpath (Union[str, pathlib.Path]) – path to the resource with image (allows to use relative path)

  • **kwargs – extra params for image read

Returns

image

Return type

np.ndarray

catalyst.contrib.utils.cv.image.imwrite(**kwargs)[source]

imwrite(uri, im, format=None, **kwargs)

Write an image to the specified file. Alias for imageio.imwrite.

Parameters

**kwargs – parameters for imageio.imwrite

Returns

image save result

catalyst.contrib.utils.cv.image.imsave(**kwargs)[source]

imwrite(uri, im, format=None, **kwargs)

Write an image to the specified file. Alias for imageio.imsave.

Parameters

**kwargs – parameters for imageio.imsave

Returns

image save result

catalyst.contrib.utils.cv.image.mask_to_overlay_image(image: numpy.ndarray, masks: List[numpy.ndarray], threshold: float = 0, mask_strength: float = 0.5) → numpy.ndarray[source]

Draws every mask for with some color over image.

Parameters
  • image – RGB image used as underlay for masks

  • masks – list of masks

  • threshold – threshold for masks binarization

  • mask_strength – opacity of colorized masks

Returns

HxWx3 image with overlay

Return type

np.ndarray

catalyst.contrib.utils.cv.image.mimread(uri, clip_range: Tuple[int, int] = None, expand_dims: bool = True, rootpath: Union[str, pathlib.Path] = None, **kwargs) → numpy.ndarray[source]

Reads multiple images from the specified file.

Parameters
  • uri (str, pathlib.Path, bytes, file) – the resource to load the image from, e.g. a filename, pathlib.Path, http address or file object, see imageio.mimread docs for more info

  • clip_range (Tuple[int, int]) – lower and upper interval edges, image values outside the interval are clipped to the interval edges

  • expand_dims – if True, append channel axis to grayscale images rootpath (Union[str, pathlib.Path]): path to the resource with image (allows to use relative path)

  • rootpath (Union[str, pathlib.Path]) – path to the resource with image (allows to use relative path)

  • **kwargs – extra params for image read

Returns

image

Return type

np.ndarray

catalyst.contrib.utils.cv.image.mimwrite_with_meta(uri, ims, meta, **kwargs)[source]

@TODO: Docs. Contribution is welcome.

Tensor

catalyst.contrib.utils.cv.tensor.tensor_from_rgb_image(image: numpy.ndarray) → torch.Tensor[source]

Creates tensor from RGB image.

Parameters

image – RGB image stored as np.ndarray

Returns

tensor

catalyst.contrib.utils.cv.tensor.tensor_to_ndimage(images: torch.Tensor, denormalize: bool = True, mean: Tuple[float, float, float] = (0.485, 0.456, 0.406), std: Tuple[float, float, float] = (0.229, 0.224, 0.225), move_channels_dim: bool = True, dtype=<class 'numpy.float32'>) → numpy.ndarray[source]

Convert float image(s) with standard normalization to np.ndarray with [0..1] when dtype is np.float32 and [0..255] when dtype is np.uint8.

Parameters
  • images – [B]xCxHxW float tensor

  • denormalize – if True, multiply image(s) by std and add mean

  • mean (Tuple[float, float, float]) – per channel mean to add

  • std (Tuple[float, float, float]) – per channel std to multiply

  • move_channels_dim – if True, convert tensor to [B]xHxWxC format

  • dtype – result ndarray dtype. Only float32 and uint8 are supported

Returns

[B]xHxWxC np.ndarray of dtype

Natural language processing

Text

catalyst.contrib.utils.nlp.text.tokenize_text(text: str, tokenizer, max_length: int, strip: bool = True, lowercase: bool = True, remove_punctuation: bool = True) → Dict[str, numpy.array][source]

Tokenizes givin text.

Parameters
  • text – text to tokenize

  • tokenizer – Tokenizer instance from HuggingFace

  • max_length – maximum length of tokens

  • strip – if true strips text before tokenizing

  • lowercase – if true makes text lowercase before tokenizing

  • remove_punctuation – if true removes string.punctuation from text before tokenizing

Returns

batch with tokenized text

catalyst.contrib.utils.nlp.text.process_bert_output(bert_output, hidden_size: int, output_hidden_states: bool = False, pooling_groups: List[str] = None, mask: torch.Tensor = None, level: Union[int, str] = None)[source]

Processed BERT output.

Parameters
  • bert_output – BERT output

  • hidden_size – hidden size of BERT layers

  • output_hidden_states – boolean flag if we need BERT hidden states

  • pooling_groups – list with pooling to use for sequence embedding

  • mask – boolean flag if we need mask [PAD] tokens

  • level – integer with specified level to use

Returns

processed output