Utilities¶

Main ¶

All utils are gathered in catalyst.utils for easier access.

Note

Everything from catalyst.contrib.utils is included in catalyst.utils

Checkpoint ¶

catalyst.utils.checkpoint.pack_checkpoint(model=None, criterion=None, optimizer=None, scheduler=None, **kwargs)[source]¶: @TODO: Docs. Contribution is welcome.

catalyst.utils.checkpoint.unpack_checkpoint(checkpoint, model=None, criterion=None, optimizer=None, scheduler=None) → None[source]¶

Load checkpoint from file and unpack the content to a model (if not None), criterion (if not None), optimizer (if not None), scheduler (if not None).

Parameters

checkpoint – checkpoint to load
model – model where should be updated state
criterion – criterion where should be updated state
optimizer – optimizer where should be updated state
scheduler – scheduler where should be updated state

catalyst.utils.checkpoint.save_checkpoint(checkpoint: Dict, logdir: Union[pathlib.Path, str], suffix: str, is_best: bool = False, is_last: bool = False, special_suffix: str = '', saver_fn: Callable = <function save>) → Union[pathlib.Path, str][source]¶

Saving checkpoint to a file.

Parameters

checkpoint – data to save.
logdir – directory where checkpoint should be stored.
suffix – checkpoint file name.
is_best – if True then also will be generated best checkpoint file.
is_last – if True then also will be generated last checkpoint file.
special_suffix – suffix to use for saving best/last checkpoints.
saver_fn – function to use for saving data to file, default is torch.save

Returns

path to saved checkpoint

catalyst.utils.checkpoint.load_checkpoint(filepath: str)[source]¶

Load checkpoint from path.

Parameters: filepath – checkpoint file to load
Returns: checkpoint content

Components ¶

catalyst.utils.components.process_components(model: torch.nn.modules.module.Module, criterion: torch.nn.modules.module.Module = None, optimizer: torch.optim.optimizer.Optimizer = None, scheduler: torch.optim.lr_scheduler._LRScheduler = None, distributed_params: Dict = None, device: Union[str, torch.device] = None) → Tuple[torch.nn.modules.module.Module, torch.nn.modules.module.Module, torch.optim.optimizer.Optimizer, torch.optim.lr_scheduler._LRScheduler, Union[str, torch.device]][source]¶

Returns the processed model, criterion, optimizer, scheduler and device.

Parameters

model – torch model
criterion – criterion function
optimizer – optimizer
scheduler – scheduler
distributed_params (dict, optional) – dict with the parameters for distributed and FP16 method
device (Device, optional) – device

Returns

tuple with processed model, criterion, optimizer, scheduler and device.

Raises

ValueError – if device is None and TPU available, for using TPU need to manualy move model/optimizer/scheduler to a TPU device and pass device to a function.
NotImplementedError – if model is not nn.Module or dict for multi-gpu, nn.ModuleDict for DataParallel not implemented yet

Config ¶

catalyst.utils.config.load_config(path: Union[str, pathlib.Path], ordered: bool = False, data_format: str = None, encoding: str = 'utf-8') → Union[Dict, List][source]¶

Loads config by giving path. Supports YAML and JSON files.

Examples

>>> load(path="./config.yml", ordered=True)

Parameters

path – path to config file (YAML or JSON)
ordered – if true the config will be loaded as OrderedDict
data_format – yaml, yml or json.
encoding – encoding to read the config

Returns

config

Return type

Union[Dict, List]

Raises

Exception – if path path doesn’t exists or file format is not YAML or JSON

Adapted from https://github.com/TezRomacH/safitty/blob/v1.2.0/safitty/parser.py#L63 which was adapted from https://github.com/catalyst-team/catalyst/blob/v19.03/catalyst/utils/config.py#L10

catalyst.utils.config.save_config(config: Union[Dict, List], path: Union[str, pathlib.Path], data_format: str = None, encoding: str = 'utf-8', ensure_ascii: bool = False, indent: int = 2) → None[source]¶

Saves config to file. Path must be either YAML or JSON.

Parameters

config (Union[Dict, List]) – config to save
path (Union[str, Path]) – path to save
data_format – yaml, yml or json.
encoding – Encoding to write file. Default is utf-8
ensure_ascii – Used for JSON, if True non-ASCII
are escaped in JSON strings. (characters) –
indent – Used for JSON

Adapted from https://github.com/TezRomacH/safitty/blob/v1.2.0/safitty/parser.py#L110 which was adapted from https://github.com/catalyst-team/catalyst/blob/v19.03/catalyst/utils/config.py#L38

Distributed ¶

catalyst.utils.distributed.check_ddp_wrapped(model: torch.nn.modules.module.Module) → bool[source]¶: Checks whether model is wrapped with DataParallel/DistributedDataParallel.

catalyst.utils.distributed.check_apex_available() → bool[source]¶: Checks if apex is available.

catalyst.utils.distributed.check_amp_available() → bool[source]¶: Checks if torch.amp is available.

catalyst.utils.distributed.check_torch_distributed_initialized() → bool[source]¶: Checks if torch.distributed is available and initialized.

catalyst.utils.distributed.check_slurm_available()[source]¶: Checks if slurm is available.

catalyst.utils.distributed.assert_fp16_available() → None[source]¶: Asserts for installed and available Apex FP16.

catalyst.utils.distributed.initialize_apex(model, optimizer=None, **distributed_params)[source]¶: @TODO: Docs. Contribution is welcome.

catalyst.utils.distributed.get_nn_from_ddp_module(model: torch.nn.modules.module.Module) → torch.nn.modules.module.Module[source]¶

Return a real model from a torch.nn.DataParallel, torch.nn.parallel.DistributedDataParallel, or apex.parallel.DistributedDataParallel.

Parameters: model – A model, or DataParallel wrapper.
Returns: A model

catalyst.utils.distributed.get_rank() → int[source]¶

Returns the rank of the current worker.

Returns: rank if torch.distributed is initialized, otherwise -1
Return type: int

catalyst.utils.distributed.get_distributed_mean(value: Union[float, torch.Tensor])[source]¶: Computes distributed mean among all nodes.

catalyst.utils.distributed.get_distributed_env(local_rank: int, rank: int, world_size: int, use_cuda_visible_devices: bool = True)[source]¶

Returns environment copy with extra distributed settings.

Parameters

local_rank – worker local rank
rank – worker global rank
world_size – worker world size
use_cuda_visible_devices – boolean flag to use available GPU devices

Returns

updated environment copy

catalyst.utils.distributed.get_distributed_params()[source]¶

Returns distributed params for experiment run.

Returns: dictionary with distributed params

catalyst.utils.distributed.get_slurm_params()[source]¶

Return slurm params for experiment run.

Returns

tuple with current node index, number of nodes, master node: and master port

Hash ¶

catalyst.utils.hash.get_hash(obj: Any) → str[source]¶

Creates unique hash from object following way: - Represent obj as sting recursively - Hash this string with sha256 hash function - encode hash with url-safe base64 encoding

Parameters: obj – object to hash
Returns: base64-encoded string

catalyst.utils.hash.get_short_hash(o) → str[source]¶: @TODO: Docs. Contribution is welcome.

Initialization ¶

catalyst.utils.initialization.get_optimal_inner_init(nonlinearity: torch.nn.modules.module.Module, **kwargs) → Callable[[torch.nn.modules.module.Module], None][source]¶

Create initializer for inner layers based on their activation function (nonlinearity).

Parameters

nonlinearity – non-linear activation
**kwargs – extra kwargs

Returns

optimal initialization function

Raises

NotImplementedError – if nonlinearity is out of sigmoid, tanh, relu, `leaky_relu

catalyst.utils.initialization.outer_init(layer: torch.nn.modules.module.Module) → None[source]¶

Initialization for output layers of policy and value networks typically used in deep reinforcement learning literature.

Parameters: layer – torch nn.Module instance

catalyst.utils.initialization.reset_weights_if_possible(module: torch.nn.modules.module.Module)[source]¶

Resets module parameters if possible.

Parameters: module – Module to reset.

Loaders ¶

catalyst.utils.loaders.get_native_batch_from_loader(loader: torch.utils.data.dataloader.DataLoader, batch_index: int = 0)[source]¶

Returns a batch from experiment loader

Parameters

loader – Loader to get batch from
batch_index – Index of batch to take from dataset of the loader

Returns

batch from loader

catalyst.utils.loaders.get_native_batch_from_loaders(loaders: Dict[str, torch.utils.data.dataloader.DataLoader], loader: Union[str, int] = 0, batch_index: int = 0)[source]¶

Returns a batch from experiment loaders by its index or name.

Parameters

loaders (Dict[str, DataLoader]) – Loaders list to get loader from
loader (Union[str, int]) – Loader name or its index, default is zero
batch_index – Index of batch to take from dataset of the loader

Returns

batch from loader

Raises

TypeError – if loader parameter is not a string or an integer

catalyst.utils.loaders.get_loader(data_source: Iterable[dict], open_fn: Callable, dict_transform: Callable = None, sampler=None, collate_fn: Callable = <function default_collate>, batch_size: int = 32, num_workers: int = 4, shuffle: bool = False, drop_last: bool = False)[source]¶

Creates a DataLoader from given source and its open/transform params.

Parameters

data_source – and iterable containing your data annotations, (for example path to images, labels, bboxes, etc)
open_fn – function, that can open your annotations dict and transfer it to data, needed by your network (for example open image by path, or tokenize read string)
dict_transform – transforms to use on dict (for example normalize image, add blur, crop/resize/etc)
sampler (Sampler, optional) – defines the strategy to draw samples from the dataset
collate_fn (callable, optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset
batch_size (int, optional) – how many samples per batch to load
num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process
shuffle (bool, optional) – set to True to have the data reshuffled at every epoch (default: False).
drop_last (bool, optional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)

Returns

DataLoader with catalyst.data.ListDataset

catalyst.utils.loaders.validate_loaders(loaders: Dict[str, torch.utils.data.dataloader.DataLoader]) → Dict[str, torch.utils.data.dataloader.DataLoader][source]¶

Check pytorch dataloaders for distributed setup. Transfers them to distirbuted mode if necessary. (Experimental feature)

Parameters

loaders (Dict[str, DataLoader]) – dictionery with pytorch dataloaders

Returns

dictionery: with pytorch dataloaders (with distributed samplers if necessary)

Return type

Dict[str, DataLoader]

catalyst.utils.loaders.get_loaders_from_params(batch_size: int = 1, num_workers: int = 0, drop_last: bool = False, per_gpu_scaling: bool = False, loaders_params: Dict[str, Any] = None, samplers_params: Dict[str, Any] = None, initial_seed: int = 42, get_datasets_fn: Callable = None, **data_params) → OrderedDict[str, DataLoader][source]¶

Creates pytorch dataloaders from datasets and additional parameters.

Parameters

batch_size – batch_size parameter from torch.utils.data.DataLoader
num_workers – num_workers parameter from torch.utils.data.DataLoader
drop_last – drop_last parameter from torch.utils.data.DataLoader
per_gpu_scaling – boolean flag, if True, uses batch_size=batch_size*num_available_gpus
loaders_params (Dict[str, Any]) – additional loaders parameters
samplers_params (Dict[str, Any]) – additional sampler parameters
initial_seed – initial seed for torch.utils.data.DataLoader workers
get_datasets_fn (Callable) – callable function to get dictionary with torch.utils.data.Datasets
**data_params – additional data parameters or dictionary with torch.utils.data.Datasets to use for pytorch dataloaders creation

Returns

dictionary with: torch.utils.data.DataLoader

Return type

OrderedDict[str, DataLoader]

Raises

NotImplementedError – if datasource is out of Dataset or dict
ValueError – if batch_sampler option is mutually exclusive with distributed

catalyst.utils.loaders.validate_loaders(loaders: Dict[str, torch.utils.data.dataloader.DataLoader]) → Dict[str, torch.utils.data.dataloader.DataLoader][source]

Check pytorch dataloaders for distributed setup. Transfers them to distirbuted mode if necessary. (Experimental feature)

Parameters

loaders (Dict[str, DataLoader]) – dictionery with pytorch dataloaders

Returns

dictionery: with pytorch dataloaders (with distributed samplers if necessary)

Return type

Dict[str, DataLoader]

Misc ¶

catalyst.utils.misc.copy_directory(input_dir: pathlib.Path, output_dir: pathlib.Path) → None[source]¶

Recursively copies the input directory.

Parameters

input_dir – input directory
output_dir – output directory

catalyst.utils.misc.format_metric(name: str, value: float) → str[source]¶

Format metric.

Metric will be returned in the scientific format if 4 decimal chars are not enough (metric value lower than 1e-4).

Parameters

name – metric name
value – value of metric

Returns

formatted metric

Return type

str

catalyst.utils.misc.get_fn_default_params(fn: Callable[[...], Any], exclude: List[str] = None)[source]¶

Return default parameters of Callable.

Parameters

fn (Callable[.., Any]) – target Callable
exclude – exclude list of parameters

Returns

contains default parameters of fn

Return type

dict

catalyst.utils.misc.get_fn_argsnames(fn: Callable[[...], Any], exclude: List[str] = None)[source]¶

Return parameter names of Callable.

Parameters

fn (Callable[.., Any]) – target Callable
exclude – exclude list of parameters

Returns

contains parameter names of fn

Return type

list

catalyst.utils.misc.get_utcnow_time(format: str = None) → str[source]¶

Return string with current utc time in chosen format.

Parameters: format – format string. if None “%y%m%d.%H%M%S” will be used.
Returns: formatted utc time string
Return type: str

catalyst.utils.misc.is_exception(ex: Any) → bool[source]¶: Check if the argument is of Exception type.

catalyst.utils.misc.maybe_recursive_call(object_or_dict, method: Union[str, Callable], recursive_args=None, recursive_kwargs=None, **kwargs)[source]¶

Calls the method recursively for the object_or_dict.

Parameters

object_or_dict – some object or a dictionary of objects
method – method name to call
recursive_args – list of arguments to pass to the method
recursive_kwargs – list of key-arguments to pass to the method
**kwargs – Arbitrary keyword arguments

Returns

result of method call

catalyst.utils.misc.fn_ends_with_pass(fn: Callable[[...], Any])[source]¶

Check that function end with pass statement (probably does nothing in any way). Mainly used to filter callbacks with empty on_{event} methods.

Parameters: fn (Callable[.., Any]) – target Callable
Returns: True if there is pass in the first indentation level of fn and nothing happens before it, False in any other case.
Return type: bool

Numpy ¶

catalyst.utils.numpy.get_one_hot(label: int, num_classes: int, smoothing: float = None) → numpy.ndarray[source]¶

Applies OneHot vectorization to a giving scalar, optional with label smoothing as described in Bag of Tricks for Image Classification with Convolutional Neural Networks.

Parameters

label – scalar value to be vectorized
num_classes – total number of classes
smoothing (float, optional) – if specified applies label smoothing from Bag of Tricks for Image Classification with Convolutional Neural Networks paper

Returns

a one-hot vector with shape (num_classes,)

Return type

np.ndarray

Parser ¶

catalyst.utils.parser.parse_config_args(*, config, args, unknown_args)[source]¶

Parse config and cli args.

Parameters

config – dict-based experiment config
args – cli args
unknown_args – cli unknown args

Returns

final experiment config and cli args

Return type

config, args

catalyst.utils.parser.parse_args_uargs(args, unknown_args)[source]¶

Function for parsing configuration files.

Parameters

args – recognized arguments
unknown_args – unrecognized arguments

Returns

updated arguments, dict with config

Return type

tuple

Pipelines ¶

catalyst.utils.pipelines.clone_pipeline(template: str, out_dir: pathlib.Path) → None[source]¶

Clones pipeline from empty pipeline template or from demo pipelines available in Git repos of Catalyst Team.

Parameters

template – type of pipeline you want to clone. empty/classification/segmentation
out_dir – path where pipeline directory should be cloned

Pruning ¶

catalyst.utils.pruning.prune_model(model: torch.nn.modules.module.Module, pruning_fn: Callable, keys_to_prune: List[str], amount: Union[float, int], layers_to_prune: Optional[List[str]] = None, reinitialize_after_pruning: Optional[bool] = False) → None[source]¶

Prune model function can be used for pruning certain tensors in model layers.

Raises

AttributeError – If layers_to_prune is not None, but there is no layers with specified name.
Exception – If no layers have specified keys.

Parameters

model – Model to be pruned.
pruning_fn – Pruning function with API same as in torch.nn.utils.pruning. pruning_fn(module, name, amount).
keys_to_prune – list of strings. Determines which tensor in modules will be pruned.
amount – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.
layers_to_prune – list of strings - module names to be pruned. If None provided then will try to prune every module in model.
reinitialize_after_pruning – if True then will reinitialize model after pruning. (Lottery Ticket Hypothesis check e.g.)

catalyst.utils.pruning.remove_reparametrization(model: torch.nn.modules.module.Module, keys_to_prune: List[str], layers_to_prune: Optional[List[str]] = None) → None[source]¶

Removes pre-hooks and pruning masks from the model.

Parameters

model – model to remove reparametrization.
keys_to_prune – list of strings. Determines which tensor in modules have already been pruned.
layers_to_prune – list of strings - module names have already been pruned. If None provided then will try to prune every module in model.

Quantization ¶

catalyst.utils.quantization.quantize_model_from_checkpoint(logdir: pathlib.Path, checkpoint_name: str, stage: str = None, qconfig_spec: Union[Set, Dict, None] = None, dtype: Optional[torch.dtype] = torch.qint8, backend: str = None) → torch.nn.modules.module.Module[source]¶

Quantize model using created experiment and runner.

Parameters

logdir (Union[str, Path]) – Path to Catalyst logdir with model
checkpoint_name – Name of model checkpoint to use
stage – experiment’s stage name
qconfig_spec – torch.quantization.quantize_dynamic parameter, you can define layers to be quantize
dtype – type of the model parameters, default int8
backend – defines backend for quantization

Returns

Quantized model

catalyst.utils.quantization.save_quantized_model(model: torch.nn.modules.module.Module, logdir: Union[str, pathlib.Path] = None, checkpoint_name: str = None, out_dir: Union[str, pathlib.Path] = None, out_model: Union[str, pathlib.Path] = None) → None[source]¶

Saves quantized model.

Parameters

model – Traced model
logdir (Union[str, Path]) – Path to experiment
checkpoint_name – name for the checkpoint
out_dir (Union[str, Path]) – Directory to save model to (overrides logdir)
out_model (Union[str, Path]) – Path to save model to (overrides logdir & out_dir)

Raises

ValueError – if nothing out of logdir, out_dir or out_model is specified.

Scripts ¶

catalyst.utils.scripts.import_module(expdir: Union[str, pathlib.Path])[source]¶

Imports python module by path.

Parameters: expdir – path to python module.
Returns: Imported module.

catalyst.utils.scripts.dump_code(expdir: Union[str, pathlib.Path], logdir: Union[str, pathlib.Path]) → None[source]¶

Dumps Catalyst code for reproducibility.

Parameters

expdir (Union[str, pathlib.Path]) – experiment dir path
logdir (Union[str, pathlib.Path]) – logging dir path

catalyst.utils.scripts.dump_python_files(src: pathlib.Path, dst: pathlib.Path) → None[source]¶

Dumps python code (*.py and *.ipynb) files.

Parameters

src – source code path
dst – destination code path

catalyst.utils.scripts.prepare_config_api_components(expdir: pathlib.Path, config: Dict)[source]¶

Imports and create core Config API components - Experiment, Runner and Config from expdir - experiment directory and config - experiment config.

Parameters

expdir – experiment directory path
config – dictionary with experiment Config

Returns

Experiment, Runner, Config for Config API usage.

catalyst.utils.scripts.dump_experiment_code(src: pathlib.Path, dst: pathlib.Path) → None[source]¶

Dumps your experiment code for Config API use cases.

Parameters

src – source code path
dst – destination code path

catalyst.utils.scripts.distributed_cmd_run(worker_fn: Callable, distributed: bool = True, *args, **kwargs) → None[source]¶

Distributed run

Parameters

worker_fn – worker fn to run in distributed mode
distributed – distributed flag
args – additional parameters for worker_fn
kwargs – additional key-value parameters for worker_fn

Seed ¶

catalyst.utils.seed.set_global_seed(seed: int) → None[source]¶

Sets random seed into PyTorch, TensorFlow, Numpy and Random.

Parameters: seed – random seed

Sys ¶

catalyst.utils.sys.get_environment_vars() → Dict[str, Any][source]¶

Creates a dictionary with environment variables.

Returns: environment variables
Return type: Dict

catalyst.utils.sys.list_conda_packages() → str[source]¶

Lists conda installed packages.

Returns: list with conda installed packages
Return type: str

catalyst.utils.sys.list_pip_packages() → str[source]¶

Lists pip installed packages.

Returns: string with pip installed packages
Return type: str

catalyst.utils.sys.dump_environment(experiment_config: Dict, logdir: str, configs_path: List[str] = None) → None[source]¶

Saves config, environment variables and package list in JSON into logdir.

Parameters

experiment_config – experiment config
logdir – path to logdir
configs_path – path(s) to config

Torch ¶

catalyst.utils.torch.get_optimizable_params(model_or_params)[source]¶: Returns all the parameters that requires gradients.

catalyst.utils.torch.get_optimizer_momentum(optimizer: torch.optim.optimizer.Optimizer) → float[source]¶

Get momentum of current optimizer.

Parameters: optimizer – PyTorch optimizer
Returns: momentum at first param group
Return type: float

catalyst.utils.torch.set_optimizer_momentum(optimizer: torch.optim.optimizer.Optimizer, value: float, index: int = 0)[source]¶

Set momentum of index ‘th param group of optimizer to value.

Parameters

optimizer – PyTorch optimizer
value – new value of momentum
index (int, optional) – integer index of optimizer’s param groups, default is 0

catalyst.utils.torch.get_device() → torch.device[source]¶: Simple returning the best available device (TPU > GPU > CPU).

catalyst.utils.torch.get_available_gpus()[source]¶

Array of available GPU ids.

Examples

>>> os.environ["CUDA_VISIBLE_DEVICES"] = "0,2"
>>> get_available_gpus()
[0, 2]

>>> os.environ["CUDA_VISIBLE_DEVICES"] = "0,-1,1"
>>> get_available_gpus()
[0]

>>> os.environ["CUDA_VISIBLE_DEVICES"] = ""
>>> get_available_gpus()
[]

>>> os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
>>> get_available_gpus()
[]

Returns: available GPU ids
Return type: iterable

catalyst.utils.torch.get_activation_fn(activation: str = None)[source]¶: Returns the activation function from torch.nn by its name.

catalyst.utils.torch.any2device(value, device: Union[str, torch.device])[source]¶

Move tensor, list of tensors, list of list of tensors, dict of tensors, tuple of tensors to target device.

Parameters

value – Object to be moved
device – target device ids

Returns

Same structure as value, but all tensors and np.arrays moved to device

catalyst.utils.torch.prepare_cudnn(deterministic: bool = None, benchmark: bool = None) → None[source]¶

Prepares CuDNN benchmark and sets CuDNN to be deterministic/non-deterministic mode

Parameters

deterministic – deterministic mode if running in CuDNN backend.
benchmark – If True use CuDNN heuristics to figure out which algorithm will be most performant for your model architecture and input. Setting it to False may slow down your training.

catalyst.utils.torch.process_model_params(model: torch.nn.modules.module.Module, layerwise_params: Dict[str, dict] = None, no_bias_weight_decay: bool = True, lr_scaling: float = 1.0) → List[Union[torch.nn.parameter.Parameter, dict]][source]¶

Gains model parameters for torch.optim.Optimizer.

Parameters

model – Model to process
layerwise_params – Order-sensitive dict where each key is regex pattern and values are layer-wise options for layers matching with a pattern
no_bias_weight_decay – If true, removes weight_decay for all bias parameters in the model
lr_scaling – layer-wise learning rate scaling, if 1.0, learning rates will not be scaled

Returns

parameters for an optimizer

Return type

iterable

Example:

>>> model = catalyst.contrib.models.segmentation.ResnetUnet()
>>> layerwise_params = collections.OrderedDict([
>>>     ("conv1.*", dict(lr=0.001, weight_decay=0.0003)),
>>>     ("conv.*", dict(lr=0.002))
>>> ])
>>> params = process_model_params(model, layerwise_params)
>>> optimizer = torch.optim.Adam(params, lr=0.0003)

catalyst.utils.torch.get_requires_grad(model: torch.nn.modules.module.Module)[source]¶

Gets the requires_grad value for all model parameters.

Example:

>>> model = SimpleModel()
>>> requires_grad = get_requires_grad(model)

Parameters: model – model
Returns: value
Return type: requires_grad (Dict[str, bool])

catalyst.utils.torch.set_requires_grad(model: torch.nn.modules.module.Module, requires_grad: Union[bool, Dict[str, bool]])[source]¶

Sets the requires_grad value for all model parameters.

Example:

>>> model = SimpleModel()
>>> set_requires_grad(model, requires_grad=True)
>>> # or
>>> model = SimpleModel()
>>> set_requires_grad(model, requires_grad={""})

Parameters

model – model
requires_grad (Union[bool, Dict[str, bool]]) – value

catalyst.utils.torch.get_network_output(net: torch.nn.modules.module.Module, *input_shapes_args, **input_shapes_kwargs)[source]¶

# noqa: D202 For each input shape returns an output tensor

Examples

>>> net = nn.Linear(10, 5)
>>> utils.get_network_output(net, (1, 10))
tensor([[[-0.2665,  0.5792,  0.9757, -0.5782,  0.1530]]])

Parameters

net – the model
*input_shapes_args – variable length argument list of shapes
**input_shapes_kwargs – key-value arguemnts of shapes

Returns

tensor with network output

catalyst.utils.torch.detach(tensor: torch.Tensor) → numpy.ndarray[source]¶

Detach a pytorch tensor from graph and convert it to numpy array

Parameters: tensor – PyTorch tensor
Returns: numpy ndarray

catalyst.utils.torch.trim_tensors(tensors)[source]¶

Trim padding off of a batch of tensors to the smallest possible length. Should be used with catalyst.data.DynamicLenBatchSampler.

Adapted from Dynamic minibatch trimming to improve BERT training speed.

Parameters: tensors – list of tensors to trim.
Returns: list of trimmed tensors.
Return type: List[torch.tensor]

catalyst.utils.torch.normalize(samples: torch.Tensor) → torch.Tensor[source]¶

Parameters: samples – tensor with shape of [n_samples, features_dim]
Returns: normalized tensor with the same shape

Tracing ¶

catalyst.utils.tracing.trace_model(model: torch.nn.modules.module.Module, predict_fn: Callable, batch=None, method_name: str = 'forward', mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, device: Union[str, torch.device] = 'cpu', predict_params: dict = None) → torch.jit.ScriptModule[source]¶

Traces model using runner and batch.

Parameters

model – Model to trace
predict_fn – Function to run prediction with the model provided, takes model, inputs parameters
batch – Batch to trace the model
method_name – Model’s method name that will be used as entrypoint during tracing
mode – Mode for model to trace (train or eval)
requires_grad – Flag to use grads
opt_level – Apex FP16 init level, optional
device – Torch device
predict_params – additional parameters for model forward

Returns

Traced model

Return type

jit.ScriptModule

Raises

ValueError – if both batch and predict_fn must be specified or mode is not in ‘eval’ or ‘train’.

catalyst.utils.tracing.trace_model_from_checkpoint(logdir: pathlib.Path, method_name: str, checkpoint_name: str, stage: str = None, loader: Union[str, int] = None, mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, device: Union[str, torch.device] = 'cpu')[source]¶

Traces model using created experiment and runner.

Parameters

logdir (Union[str, Path]) – Path to Catalyst logdir with model
checkpoint_name – Name of model checkpoint to use
stage – experiment’s stage name
loader (Union[str, int]) – experiment’s loader name or its index
method_name – Model’s method name that will be used as entrypoint during tracing
mode – Mode for model to trace (train or eval)
requires_grad – Flag to use grads
opt_level – AMP FP16 init level
device – Torch device

Returns

the traced model

catalyst.utils.tracing.trace_model_from_runner(runner: IRunner, checkpoint_name: str = None, method_name: str = 'forward', mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, device: Union[str, torch.device] = 'cpu') → torch.jit.ScriptModule[source]¶

Traces model using created experiment and runner.

Parameters

runner – current runner.
checkpoint_name – Name of model checkpoint to use, if None traces current model from runner
method_name – Model’s method name that will be used as entrypoint during tracing
mode – Mode for model to trace (train or eval)
requires_grad – Flag to use grads
opt_level – AMP FP16 init level
device – Torch device

Returns

Traced model

Return type

ScriptModule

catalyst.utils.tracing.get_trace_name(method_name: str, mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, additional_string: str = None) → str[source]¶

Creates a file name for the traced model.

Parameters

method_name – model’s method name
mode – train or eval
requires_grad – flag if model was traced with gradients
opt_level – opt_level if model was traced in FP16
additional_string – any additional information

Returns

Filename for traced model to be saved.

Return type

str

catalyst.utils.tracing.save_traced_model(model: torch.jit.ScriptModule, logdir: Union[str, pathlib.Path] = None, method_name: str = 'forward', mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, out_dir: Union[str, pathlib.Path] = None, out_model: Union[str, pathlib.Path] = None, checkpoint_name: str = None) → None[source]¶

Saves traced model.

Parameters

model – Traced model
logdir (Union[str, Path]) – Path to experiment
method_name – Name of the method was traced
mode – Model’s mode - train or eval
requires_grad – Whether model was traced with require_grad or not
opt_level – Apex FP16 init level used during tracing
out_dir (Union[str, Path]) – Directory to save model to (overrides logdir)
out_model (Union[str, Path]) – Path to save model to (overrides logdir & out_dir)
checkpoint_name – Checkpoint name used to restore the model

Raises

ValueError – if nothing out of logdir, out_dir or out_model is specified.

catalyst.utils.tracing.load_traced_model(model_path: Union[str, pathlib.Path], device: Union[str, torch.device] = 'cpu', opt_level: str = None) → torch.jit.ScriptModule[source]¶

Loads a traced model.

Parameters

model_path – Path to traced model
device – Torch device
opt_level – Apex FP16 init level, optional

Returns

Traced model

Return type

ScriptModule

Wizard ¶

catalyst.utils.wizard.run_wizard()[source]¶: Method to initialize and run wizard.

class catalyst.utils.wizard.Wizard[source]¶

Bases: object

Class for Catalyst Config API Wizard.

The instance of this class will be created and called from cli command: catalyst-dl init --interactive.

With help of this Wizard user will be able to setup pipeline from available templates and make choices of what predefined classes to use in different parts of pipeline.

__init__()[source]¶: Initialization of instance of this class will print welcome message and logo of Catalyst in ASCII format. Also here we’ll save all classes of Catalyst own pipeline parts to be able to put user’s modules on top of lists to ease the choice.

run()[source]¶: Walks user through predefined wizard steps.

Contrib ¶

Argparse ¶

catalyst.contrib.utils.argparse.boolean_flag(parser: argparse.ArgumentParser, name: str, default: Optional[bool] = False, help: str = None, shorthand: str = None) → None[source]¶

Add a boolean flag to a parser inplace.

Examples

>>> parser = argparse.ArgumentParser()
>>> boolean_flag(
>>>     parser, "flag", default=False, help="some flag", shorthand="f"
>>> )

Parameters

parser – parser to add the flag to
name – argument name –<name> will enable the flag, while –no-<name> will disable it
default (bool, optional) – default value of the flag
help – help string for the flag
shorthand – shorthand string for the argument

Compression ¶

catalyst.contrib.utils.compression.pack(data)¶

Serialize the data into bytes using pickle.

Parameters: data – a value
Returns: Returns a bytes object serialized with pickle data.

catalyst.contrib.utils.compression.pack_if_needed(data)¶

Serialize the data into bytes using pickle.

Parameters: data – a value
Returns: Returns a bytes object serialized with pickle data.

catalyst.contrib.utils.compression.unpack(bytes)¶

Deserialize bytes into an object using pickle.

Parameters: bytes – a bytes object containing serialized with pickle data.
Returns: Returns a value deserialized from the bytes-like object.

catalyst.contrib.utils.compression.unpack_if_needed(bytes)¶

Deserialize bytes into an object using pickle.

Parameters: bytes – a bytes object containing serialized with pickle data.
Returns: Returns a value deserialized from the bytes-like object.

Confusion Matrix ¶

catalyst.contrib.utils.confusion_matrix.calculate_tp_fp_fn(confusion_matrix: numpy.ndarray) → numpy.ndarray[source]¶: @TODO: Docs. Contribution is welcome.

catalyst.contrib.utils.confusion_matrix.calculate_confusion_matrix_from_arrays(predictions: numpy.ndarray, labels: numpy.ndarray, num_classes: int) → numpy.ndarray[source]¶

Calculate confusion matrix for a given set of classes. If labels value is outside of the [0, num_classes) it is excluded.

Parameters

predictions – model predictions
labels – ground truth labels
num_classes – number of classes

Returns

confusion matrix

Return type

np.ndarray

catalyst.contrib.utils.confusion_matrix.calculate_confusion_matrix_from_tensors(y_pred_logits: torch.Tensor, y_true: torch.Tensor) → numpy.ndarray[source]¶

Calculate confusion matrix from tensors.

Parameters

y_pred_logits – model logits
y_true – true labels

Returns

confusion matrix

Return type

np.ndarray

Dataset ¶

catalyst.contrib.utils.dataset.create_dataset(dirs: str, extension: str = None, process_fn: Callable[[str], object] = None, recursive: bool = False) → Dict[str, object][source]¶

Create dataset (dict like {key: [values]}) from vctk-like dataset:

dataset/
    cat/
        *.ext
    dog/
        *.ext

Parameters

dirs – path to dirs, for example /home/user/data/**
extension – data extension you are looking for
process_fn (Callable[[str], object]) – function(path_to_file) -> object process function for found files, by default
recursive – enables recursive globbing

Returns

dataset

Return type

dict

catalyst.contrib.utils.dataset.create_dataframe(dataset: Dict[str, object], **dataframe_args) → pandas.core.frame.DataFrame[source]¶

Create pd.DataFrame from dict like {key: [values]}.

Parameters

dataset – dict like {key: [values]}
**dataframe_args –

indexIndex or array-like
Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided

columnsIndex or array-like
Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided

dtypedtype, default None
Data type to force, otherwise infer

Returns

dataframe from giving dataset

Return type

pd.DataFrame

catalyst.contrib.utils.dataset.split_dataset_train_test(dataset: pandas.core.frame.DataFrame, **train_test_split_args) → Tuple[Dict[str, object], Dict[str, object]][source]¶

Split dataset in train and test parts.

Parameters

dataset – dict like dataset
**train_test_split_args –

test_sizefloat, int, or None (default is None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is automatically set to the complement of the train size. If train size is also None, test size is set to 0.25.

train_sizefloat, int, or None (default is None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

random_stateint or RandomState
Pseudo-random number generator state used for random sampling.

stratifyarray-like or None (default is None)
If not None, data is split in a stratified fashion, using this as the class labels.

Returns

train and test dicts

Misc ¶

catalyst.contrib.utils.misc.args_are_not_none(*args: Optional[Any]) → bool[source]¶

Check that all arguments are not None.

Parameters: *args – values # noqa: RST213
Returns: True if all value were not None, False otherwise
Return type: bool

catalyst.contrib.utils.misc.make_tuple(tuple_like)[source]¶

Creates a tuple if given tuple_like value isn’t list or tuple.

Parameters: tuple_like – tuple like object - list or tuple
Returns: tuple or list

catalyst.contrib.utils.misc.pairwise(iterable: Iterable[Any]) → Iterable[Any][source]¶

Iterate sequences by pairs.

Examples

>>> for i in pairwise([1, 2, 5, -3]):
>>>     print(i)
(1, 2)
(2, 5)
(5, -3)

Parameters: iterable – Any iterable sequence
Returns: pairwise iterator

catalyst.contrib.utils.misc.find_value_ids(it: Iterable[Any], value: Any) → List[int][source]¶

Parameters

it – list of any
value – query element

Returns

indices of the all elements equal x0

Pandas ¶

catalyst.contrib.utils.pandas.dataframe_to_list(dataframe: pandas.core.frame.DataFrame) → List[dict][source]¶

Converts dataframe to a list of rows (without indexes).

Parameters: dataframe – input dataframe
Returns: list of rows
Return type: List[dict]

catalyst.contrib.utils.pandas.folds_to_list(folds: Union[list, str, pandas.core.series.Series]) → List[int][source]¶

This function formats string or either list of numbers into a list of unique int.

Examples

>>> folds_to_list("1,2,1,3,4,2,4,6")
[1, 2, 3, 4, 6]
>>> folds_to_list([1, 2, 3.0, 5])
[1, 2, 3, 5]

Parameters: folds (Union[list, str, pd.Series]) – Either list of numbers or one string with numbers separated by commas or pandas series
Returns: list of unique ints
Return type: List[int]
Raises: ValueError – if value in string or array cannot be casted to int

catalyst.contrib.utils.pandas.split_dataframe(dataframe: pandas.core.frame.DataFrame, train_folds: List[int], valid_folds: Optional[List[int]] = None, infer_folds: Optional[List[int]] = None, tag2class: Optional[Dict[str, int]] = None, tag_column: str = None, class_column: str = None, seed: int = 42, n_folds: int = 5) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]¶

Split a Pandas DataFrame into folds.

Parameters

dataframe – input dataframe
train_folds – train folds
valid_folds (List[int], optional) – valid folds. If none takes all folds not included in train_folds
infer_folds (List[int], optional) – infer folds. If none takes all folds not included in train_folds and valid_folds
tag2class (Dict[str, int], optional) – mapping from label names into int
tag_column (str, optional) – column with label names
class_column (str, optional) – column to use for split
seed – seed for split
n_folds – number of folds

Returns

tuple with 4 dataframes: whole dataframe, train part, valid part and infer part

Return type

tuple

catalyst.contrib.utils.pandas.split_dataframe_on_column_folds(dataframe: pandas.core.frame.DataFrame, column: str, random_state: int = 42, n_folds: int = 5) → pandas.core.frame.DataFrame[source]¶

Splits DataFrame into N folds.

Parameters

dataframe – a dataset
column – which column to use
random_state – seed for random shuffle
n_folds – number of result folds

Returns

new dataframe with fold column

Return type

pd.DataFrame

catalyst.contrib.utils.pandas.split_dataframe_on_folds(dataframe: pandas.core.frame.DataFrame, random_state: int = 42, n_folds: int = 5) → pandas.core.frame.DataFrame[source]¶

Splits DataFrame into N folds.

Parameters

dataframe – a dataset
random_state – seed for random shuffle
n_folds – number of result folds

Returns

new dataframe with fold column

Return type

pd.DataFrame

catalyst.contrib.utils.pandas.split_dataframe_on_stratified_folds(dataframe: pandas.core.frame.DataFrame, class_column: str, random_state: int = 42, n_folds: int = 5) → pandas.core.frame.DataFrame[source]¶

Splits DataFrame into N stratified folds.

Also see catalyst.data.sampler.BalanceClassSampler

Parameters

dataframe – a dataset
class_column – which column to use for split
random_state – seed for random shuffle
n_folds – number of result folds

Returns

new dataframe with fold column

Return type

pd.DataFrame

catalyst.contrib.utils.pandas.split_dataframe_train_test(dataframe: pandas.core.frame.DataFrame, **train_test_split_args) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]¶

Split dataframe in train and test part.

Parameters

dataframe – pd.DataFrame to split
**train_test_split_args –

test_sizefloat, int, or None (default is None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is automatically set to the complement of the train size. If train size is also None, test size is set to 0.25.

train_sizefloat, int, or None (default is None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.

random_stateint or RandomState
Pseudo-random number generator state used for random sampling.

stratifyarray-like or None (default is None)
If not None, data is split in a stratified fashion, using this as the class labels.

Returns

train and test DataFrames

Note

It exist cause sklearn split is overcomplicated.

catalyst.contrib.utils.pandas.separate_tags(dataframe: pandas.core.frame.DataFrame, tag_column: str = 'tag', tag_delim: str = ', ') → pandas.core.frame.DataFrame[source]¶

Separates values in class_column column.

Parameters

dataframe – a dataset
tag_column – column name to separate values
tag_delim – delimiter to separate values

Returns

new dataframe

Return type

pd.DataFrame

catalyst.contrib.utils.pandas.read_multiple_dataframes(in_csv_train: str = None, in_csv_valid: str = None, in_csv_infer: str = None, tag2class: Optional[Dict[str, int]] = None, class_column: str = None, tag_column: str = None) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]¶

This function reads train/valid/infer dataframes from giving paths.

Parameters

in_csv_train – paths to train csv separated by commas
in_csv_valid – paths to valid csv separated by commas
in_csv_infer – paths to infer csv separated by commas
tag2class (Dict[str, int], optional) – mapping from label names into int
tag_column (str, optional) – column with label names
class_column (str, optional) – column to use for split

Returns

tuple with 4 dataframes: whole dataframe, train part, valid part and infer part

Return type

tuple

catalyst.contrib.utils.pandas.map_dataframe(dataframe: pandas.core.frame.DataFrame, tag_column: str, class_column: str, tag2class: Dict[str, int], verbose: bool = False) → pandas.core.frame.DataFrame[source]¶

This function maps tags from tag_column to ints into class_column using tag2class dictionary.

Parameters

dataframe – input dataframe
tag_column – column with tags
class_column (str) –
tag2class (Dict[str, int]) – mapping from tags to class labels
verbose – flag if true, uses tqdm

Returns

updated dataframe with class_column

Return type

pd.DataFrame

catalyst.contrib.utils.pandas.get_dataset_labeling(dataframe: pandas.core.frame.DataFrame, tag_column: str) → Dict[str, int][source]¶

Prepares a mapping using unique values from tag_column.

{
    "class_name_0": 0,
    "class_name_1": 1,
    ...
    "class_name_N": N
}

Parameters

dataframe – a dataset
tag_column – which column to use

Returns

mapping from tag to labels

Return type

Dict[str, int]

catalyst.contrib.utils.pandas.merge_multiple_fold_csv(fold_name: str, paths: Optional[str]) → pandas.core.frame.DataFrame[source]¶

Reads csv into one DataFrame with column fold.

Parameters

fold_name – current fold name
paths – paths to csv separated by commas

Returns

merged dataframes with column fold == fold_name

Return type

pd.DataFrame

catalyst.contrib.utils.pandas.read_csv_data(in_csv: str = None, train_folds: Optional[List[int]] = None, valid_folds: Optional[List[int]] = None, infer_folds: Optional[List[int]] = None, seed: int = 42, n_folds: int = 5, in_csv_train: str = None, in_csv_valid: str = None, in_csv_infer: str = None, tag2class: Optional[Dict[str, int]] = None, class_column: str = None, tag_column: str = None) → Tuple[pandas.core.frame.DataFrame, List[dict], List[dict], List[dict]][source]¶

From giving path in_csv reads a dataframe and split it to train/valid/infer folds or from several paths in_csv_train, in_csv_valid, in_csv_infer reads independent folds.

Note

This function can be used with different combinations of params.

First block is used to get dataset from one csv:: in_csv, train_folds, valid_folds, infer_folds, seed, n_folds
Second includes paths to different csv for train/valid and infer parts:: in_csv_train, in_csv_valid, in_csv_infer
The other params (tag2class, tag_column, class_column) are optional: for any previous block

Parameters

in_csv – paths to whole dataset
train_folds – train folds
valid_folds (List[int], optional) – valid folds. If none takes all folds not included in train_folds
infer_folds (List[int], optional) – infer folds. If none takes all folds not included in train_folds and valid_folds
seed – seed for split
n_folds – number of folds
in_csv_train – paths to train csv separated by commas
in_csv_valid – paths to valid csv separated by commas
in_csv_infer – paths to infer csv separated by commas
tag2class (Dict[str, int]) – mapping from label names into ints
tag_column – column with label names
class_column – column to use for split

Returns

tuple with 4 elements (whole dataframe, list with train data, list with valid data and list with infer data)

Return type

Tuple[pd.DataFrame, List[dict], List[dict], List[dict]]

catalyst.contrib.utils.pandas.balance_classes(dataframe: pandas.core.frame.DataFrame, class_column: str = 'label', random_state: int = 42, how: str = 'downsampling') → pandas.core.frame.DataFrame[source]¶

Balance classes in dataframe by class_column.

Parameters

dataframe – a dataset
class_column – which column to use for split
random_state – seed for random shuffle
how – strategy to sample, must be one on [“downsampling”, “upsampling”]

Returns

new dataframe with balanced class_column

Return type

pd.DataFrame

Raises

NotImplementedError – if how is not in [“upsampling”, “downsampling”, int]

Parallel ¶

catalyst.contrib.utils.parallel.parallel_imap(func, args, pool: Union[multiprocessing.pool.Pool, catalyst.contrib.utils.parallel.DumbPool]) → List[T][source]¶: @TODO: Docs. Contribution is welcome.

catalyst.contrib.utils.parallel.tqdm_parallel_imap(func, args, pool: Union[multiprocessing.pool.Pool, catalyst.contrib.utils.parallel.DumbPool], total: int = None, pbar=<class 'tqdm.std.tqdm'>) → List[T][source]¶: @TODO: Docs. Contribution is welcome.

catalyst.contrib.utils.parallel.get_pool(workers: int) → Union[multiprocessing.pool.Pool, catalyst.contrib.utils.parallel.DumbPool][source]¶: @TODO: Docs. Contribution is welcome.

Plotly ¶

catalyst.contrib.utils.plotly.plot_tensorboard_log(logdir: Union[str, pathlib.Path], step: Optional[str] = 'batch', metrics: Optional[List[str]] = None, height: Optional[int] = None, width: Optional[int] = None) → None[source]¶

@TODO: Docs. Contribution is welcome.

Adapted from https://github.com/belskikh/kekas/blob/v0.1.23/kekas/utils.py#L193

catalyst.contrib.utils.plotly.plot_metrics(logdir: Union[str, pathlib.Path], step: Optional[str] = 'epoch', metrics: Optional[List[str]] = None, height: Optional[int] = None, width: Optional[int] = None) → None[source]¶

Plots your learning results.

Parameters

logdir – the logdir that was specified during training.
step – ‘batch’ or ‘epoch’ - what logs to show: for batches or for epochs
metrics – list of metrics to plot. The loss should be specified as ‘loss’, learning rate = ‘_base/lr’ and other metrics should be specified as names in metrics dict that was specified during training
height – the height of the whole resulting plot
width – the width of the whole resulting plot

Serialization ¶

catalyst.contrib.utils.serialization.serialize(data)¶

Serialize the data into bytes using pickle.

Parameters: data – a value
Returns: Returns a bytes object serialized with pickle data.

catalyst.contrib.utils.serialization.deserialize(bytes)¶

Deserialize bytes into an object using pickle.

Parameters: bytes – a bytes object containing serialized with pickle data.
Returns: Returns a value deserialized from the bytes-like object.

Visualization ¶

catalyst.contrib.utils.visualization.plot_confusion_matrix(cm, class_names=None, normalize=False, title='confusion matrix', fname=None, show=True, figsize=12, fontsize=32, colormap='Blues')[source]¶: Render the confusion matrix and return matplotlib”s figure with it. Normalization can be applied by setting normalize=True.

catalyst.contrib.utils.visualization.render_figure_to_tensor(figure)[source]¶: @TODO: Docs. Contribution is welcome.

Computer Vision ¶

Image ¶

catalyst.contrib.utils.cv.image.has_image_extension(uri) → bool[source]¶

Check that file has image extension.

Parameters: uri (Union[str, pathlib.Path]) – the resource to load the file from
Returns: True if file has image extension, False otherwise
Return type: bool

catalyst.contrib.utils.cv.image.imread(uri, grayscale: bool = False, expand_dims: bool = True, rootpath: Union[str, pathlib.Path] = None, **kwargs) → numpy.ndarray[source]¶

Reads an image from the specified file.

Parameters

uri (str, pathlib.Path, bytes, file) – the resource to load the image from, e.g. a filename, pathlib.Path, http address or file object, see imageio.imread docs for more info
grayscale – if True, make all images grayscale
expand_dims – if True, append channel axis to grayscale images rootpath (Union[str, pathlib.Path]): path to the resource with image (allows to use relative path)
rootpath (Union[str, pathlib.Path]) – path to the resource with image (allows to use relative path)
**kwargs – extra params for image read

Returns

image

Return type

np.ndarray

catalyst.contrib.utils.cv.image.imwrite(**kwargs)[source]¶

imwrite(uri, im, format=None, **kwargs)

Write an image to the specified file. Alias for imageio.imwrite.

Parameters: **kwargs – parameters for imageio.imwrite
Returns: image save result

catalyst.contrib.utils.cv.image.imsave(**kwargs)[source]¶

imwrite(uri, im, format=None, **kwargs)

Write an image to the specified file. Alias for imageio.imsave.

Parameters: **kwargs – parameters for imageio.imsave
Returns: image save result

catalyst.contrib.utils.cv.image.mask_to_overlay_image(image: numpy.ndarray, masks: List[numpy.ndarray], threshold: float = 0, mask_strength: float = 0.5) → numpy.ndarray[source]¶

Draws every mask for with some color over image.

Parameters

image – RGB image used as underlay for masks
masks – list of masks
threshold – threshold for masks binarization
mask_strength – opacity of colorized masks

Returns

HxWx3 image with overlay

Return type

np.ndarray

catalyst.contrib.utils.cv.image.mimread(uri, clip_range: Tuple[int, int] = None, expand_dims: bool = True, rootpath: Union[str, pathlib.Path] = None, **kwargs) → numpy.ndarray[source]¶

Reads multiple images from the specified file.

Parameters

uri (str, pathlib.Path, bytes, file) – the resource to load the image from, e.g. a filename, pathlib.Path, http address or file object, see imageio.mimread docs for more info
clip_range (Tuple[int, int]) – lower and upper interval edges, image values outside the interval are clipped to the interval edges
expand_dims – if True, append channel axis to grayscale images rootpath (Union[str, pathlib.Path]): path to the resource with image (allows to use relative path)
rootpath (Union[str, pathlib.Path]) – path to the resource with image (allows to use relative path)
**kwargs – extra params for image read

Returns

image

Return type

np.ndarray

catalyst.contrib.utils.cv.image.mimwrite_with_meta(uri, ims, meta, **kwargs)[source]¶: @TODO: Docs. Contribution is welcome.

Tensor ¶

catalyst.contrib.utils.cv.tensor.tensor_from_rgb_image(image: numpy.ndarray) → torch.Tensor[source]¶: @TODO: Docs. Contribution is welcome.

catalyst.contrib.utils.cv.tensor.tensor_to_ndimage(images: torch.Tensor, denormalize: bool = True, mean: Tuple[float, float, float] = (0.485, 0.456, 0.406), std: Tuple[float, float, float] = (0.229, 0.224, 0.225), move_channels_dim: bool = True, dtype=<class 'numpy.float32'>) → numpy.ndarray[source]¶

Convert float image(s) with standard normalization to np.ndarray with [0..1] when dtype is np.float32 and [0..255] when dtype is np.uint8.

Parameters

images – [B]xCxHxW float tensor
denormalize – if True, multiply image(s) by std and add mean
mean (Tuple[float, float, float]) – per channel mean to add
std (Tuple[float, float, float]) – per channel std to multiply
move_channels_dim – if True, convert tensor to [B]xHxWxC format
dtype – result ndarray dtype. Only float32 and uint8 are supported

Returns

[B]xHxWxC np.ndarray of dtype

Natural language processing ¶

Text ¶

catalyst.contrib.utils.nlp.text.tokenize_text(text: str, tokenizer, max_length: int, strip: bool = True, lowercase: bool = True, remove_punctuation: bool = True) → Dict[str, numpy.array][source]¶

Tokenizes givin text.

Parameters

text – text to tokenize
tokenizer – Tokenizer instance from HuggingFace
max_length – maximum length of tokens
strip – if true strips text before tokenizing
lowercase – if true makes text lowercase before tokenizing
remove_punctuation – if true removes string.punctuation from text before tokenizing

Returns

batch with tokenized text

catalyst.contrib.utils.nlp.text.process_bert_output(bert_output, hidden_size: int, output_hidden_states: bool = False, pooling_groups: List[str] = None, mask: torch.Tensor = None, level: Union[int, str] = None)[source]¶

Processed BERT output.

Parameters

bert_output – BERT output
hidden_size – hidden size of BERT layers
output_hidden_states – boolean flag if we need BERT hidden states
pooling_groups – list with pooling to use for sequence embedding
mask – boolean flag if we need mask [PAD] tokens
level – integer with specified level to use

Returns

processed output