Utilities¶
Main¶
All utils are gathered in catalyst.utils for easier access.
Note
Everything from catalyst.contrib.utils is included in catalyst.utils
Checkpoint¶
- 
catalyst.utils.checkpoint.pack_checkpoint(model=None, criterion=None, optimizer=None, scheduler=None, **kwargs)[source]¶
- @TODO: Docs. Contribution is welcome. 
- 
catalyst.utils.checkpoint.unpack_checkpoint(checkpoint, model=None, criterion=None, optimizer=None, scheduler=None) → None[source]¶
- Load checkpoint from file and unpack the content to a model (if not None), criterion (if not None), optimizer (if not None), scheduler (if not None). - Parameters
- checkpoint – checkpoint to load 
- model – model where should be updated state 
- criterion – criterion where should be updated state 
- optimizer – optimizer where should be updated state 
- scheduler – scheduler where should be updated state 
 
 
- 
catalyst.utils.checkpoint.save_checkpoint(checkpoint: Dict, logdir: Union[pathlib.Path, str], suffix: str, is_best: bool = False, is_last: bool = False, special_suffix: str = '', saver_fn: Callable = <function save>) → Union[pathlib.Path, str][source]¶
- Saving checkpoint to a file. - Parameters
- checkpoint – data to save. 
- logdir – directory where checkpoint should be stored. 
- suffix – checkpoint file name. 
- is_best – if - Truethen also will be generated best checkpoint file.
- is_last – if - Truethen also will be generated last checkpoint file.
- special_suffix – suffix to use for saving best/last checkpoints. 
- saver_fn – function to use for saving data to file, default is - torch.save
 
- Returns
- path to saved checkpoint 
 
Components¶
- 
catalyst.utils.components.process_components(model: torch.nn.modules.module.Module, criterion: torch.nn.modules.module.Module = None, optimizer: torch.optim.optimizer.Optimizer = None, scheduler: torch.optim.lr_scheduler._LRScheduler = None, distributed_params: Dict = None, device: Union[str, torch.device] = None) → Tuple[torch.nn.modules.module.Module, torch.nn.modules.module.Module, torch.optim.optimizer.Optimizer, torch.optim.lr_scheduler._LRScheduler, Union[str, torch.device]][source]¶
- Returns the processed model, criterion, optimizer, scheduler and device. - Parameters
- model – torch model 
- criterion – criterion function 
- optimizer – optimizer 
- scheduler – scheduler 
- distributed_params (dict, optional) – dict with the parameters for distributed and FP16 method 
- device (Device, optional) – device 
 
- Returns
- tuple with processed model, criterion, optimizer, scheduler and device. 
- Raises
- ValueError – if device is None and TPU available, for using TPU need to manualy move model/optimizer/scheduler to a TPU device and pass device to a function. 
- NotImplementedError – if model is not nn.Module or dict for multi-gpu, nn.ModuleDict for DataParallel not implemented yet 
 
 
Config¶
- 
catalyst.utils.config.load_config(path: Union[str, pathlib.Path], ordered: bool = False, data_format: str = None, encoding: str = 'utf-8') → Union[Dict, List][source]¶
- Loads config by giving path. Supports YAML and JSON files. - Examples - >>> load(path="./config.yml", ordered=True) - Parameters
- path – path to config file (YAML or JSON) 
- ordered – if true the config will be loaded as - OrderedDict
- data_format – - yaml,- ymlor- json.
- encoding – encoding to read the config 
 
- Returns
- config 
- Return type
- Union[Dict, List] 
- Raises
- Exception – if path - pathdoesn’t exists or file format is not YAML or JSON
 - Adapted from https://github.com/TezRomacH/safitty/blob/v1.2.0/safitty/parser.py#L63 which was adapted from https://github.com/catalyst-team/catalyst/blob/v19.03/catalyst/utils/config.py#L10 
- 
catalyst.utils.config.save_config(config: Union[Dict, List], path: Union[str, pathlib.Path], data_format: str = None, encoding: str = 'utf-8', ensure_ascii: bool = False, indent: int = 2) → None[source]¶
- Saves config to file. Path must be either YAML or JSON. - Parameters
- config (Union[Dict, List]) – config to save 
- path (Union[str, Path]) – path to save 
- data_format – - yaml,- ymlor- json.
- encoding – Encoding to write file. Default is - utf-8
- ensure_ascii – Used for JSON, if True non-ASCII 
- are escaped in JSON strings. (characters) – 
- indent – Used for JSON 
 
 - Adapted from https://github.com/TezRomacH/safitty/blob/v1.2.0/safitty/parser.py#L110 which was adapted from https://github.com/catalyst-team/catalyst/blob/v19.03/catalyst/utils/config.py#L38 
Distributed¶
- 
catalyst.utils.distributed.check_ddp_wrapped(model: torch.nn.modules.module.Module) → bool[source]¶
- Checks whether model is wrapped with DataParallel/DistributedDataParallel. 
- 
catalyst.utils.distributed.check_torch_distributed_initialized() → bool[source]¶
- Checks if torch.distributed is available and initialized. 
- 
catalyst.utils.distributed.assert_fp16_available() → None[source]¶
- Asserts for installed and available Apex FP16. 
- 
catalyst.utils.distributed.initialize_apex(model, optimizer=None, **distributed_params)[source]¶
- @TODO: Docs. Contribution is welcome. 
- 
catalyst.utils.distributed.get_nn_from_ddp_module(model: torch.nn.modules.module.Module) → torch.nn.modules.module.Module[source]¶
- Return a real model from a torch.nn.DataParallel, torch.nn.parallel.DistributedDataParallel, or apex.parallel.DistributedDataParallel. - Parameters
- model – A model, or DataParallel wrapper. 
- Returns
- A model 
 
- 
catalyst.utils.distributed.get_rank() → int[source]¶
- Returns the rank of the current worker. - Returns
- rankif torch.distributed is initialized, otherwise- -1
- Return type
- int 
 
- 
catalyst.utils.distributed.get_distributed_mean(value: Union[float, torch.Tensor])[source]¶
- Computes distributed mean among all nodes. 
- 
catalyst.utils.distributed.get_distributed_env(local_rank: int, rank: int, world_size: int, use_cuda_visible_devices: bool = True)[source]¶
- Returns environment copy with extra distributed settings. - Parameters
- local_rank – worker local rank 
- rank – worker global rank 
- world_size – worker world size 
- use_cuda_visible_devices – boolean flag to use available GPU devices 
 
- Returns
- updated environment copy 
 
Hash¶
Initialization¶
- 
catalyst.utils.initialization.get_optimal_inner_init(nonlinearity: torch.nn.modules.module.Module, **kwargs) → Callable[[torch.nn.modules.module.Module], None][source]¶
- Create initializer for inner layers based on their activation function (nonlinearity). - Parameters
- nonlinearity – non-linear activation 
- **kwargs – extra kwargs 
 
- Returns
- optimal initialization function 
- Raises
- NotImplementedError – if nonlinearity is out of sigmoid, tanh, relu, `leaky_relu 
 
Loaders¶
- 
catalyst.utils.loaders.get_native_batch_from_loader(loader: torch.utils.data.dataloader.DataLoader, batch_index: int = 0)[source]¶
- Returns a batch from experiment loader - Parameters
- loader – Loader to get batch from 
- batch_index – Index of batch to take from dataset of the loader 
 
- Returns
- batch from loader 
 
- 
catalyst.utils.loaders.get_native_batch_from_loaders(loaders: Dict[str, torch.utils.data.dataloader.DataLoader], loader: Union[str, int] = 0, batch_index: int = 0)[source]¶
- Returns a batch from experiment loaders by its index or name. - Parameters
- loaders (Dict[str, DataLoader]) – Loaders list to get loader from 
- loader (Union[str, int]) – Loader name or its index, default is zero 
- batch_index – Index of batch to take from dataset of the loader 
 
- Returns
- batch from loader 
- Raises
- TypeError – if loader parameter is not a string or an integer 
 
- 
catalyst.utils.loaders.get_loader(data_source: Iterable[dict], open_fn: Callable, dict_transform: Callable = None, sampler=None, collate_fn: Callable = <function default_collate>, batch_size: int = 32, num_workers: int = 4, shuffle: bool = False, drop_last: bool = False)[source]¶
- Creates a DataLoader from given source and its open/transform params. - Parameters
- data_source – and iterable containing your data annotations, (for example path to images, labels, bboxes, etc) 
- open_fn – function, that can open your annotations dict and transfer it to data, needed by your network (for example open image by path, or tokenize read string) 
- dict_transform – transforms to use on dict (for example normalize image, add blur, crop/resize/etc) 
- sampler (Sampler, optional) – defines the strategy to draw samples from the dataset 
- collate_fn (callable, optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset 
- batch_size (int, optional) – how many samples per batch to load 
- num_workers (int, optional) – how many subprocesses to use for data loading. - 0means that the data will be loaded in the main process
- shuffle (bool, optional) – set to - Trueto have the data reshuffled at every epoch (default:- False).
- drop_last (bool, optional) – set to - Trueto drop the last incomplete batch, if the dataset size is not divisible by the batch size. If- Falseand the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default:- False)
 
- Returns
- DataLoader with - catalyst.data.ListDataset
 
- 
catalyst.utils.loaders.validate_loaders(loaders: Dict[str, torch.utils.data.dataloader.DataLoader]) → Dict[str, torch.utils.data.dataloader.DataLoader][source]¶
- Check pytorch dataloaders for distributed setup. Transfers them to distirbuted mode if necessary. (Experimental feature) - Parameters
- loaders (Dict[str, DataLoader]) – dictionery with pytorch dataloaders 
- Returns
- dictionery
- with pytorch dataloaders (with distributed samplers if necessary) 
 
- Return type
- Dict[str, DataLoader] 
 
- 
catalyst.utils.loaders.get_loaders_from_params(batch_size: int = 1, num_workers: int = 0, drop_last: bool = False, per_gpu_scaling: bool = False, loaders_params: Dict[str, Any] = None, samplers_params: Dict[str, Any] = None, initial_seed: int = 42, get_datasets_fn: Callable = None, **data_params) → OrderedDict[str, DataLoader][source]¶
- Creates pytorch dataloaders from datasets and additional parameters. - Parameters
- batch_size – - batch_sizeparameter from- torch.utils.data.DataLoader
- num_workers – - num_workersparameter from- torch.utils.data.DataLoader
- drop_last – - drop_lastparameter from- torch.utils.data.DataLoader
- per_gpu_scaling – boolean flag, if - True, uses- batch_size=batch_size*num_available_gpus
- loaders_params (Dict[str, Any]) – additional loaders parameters 
- samplers_params (Dict[str, Any]) – additional sampler parameters 
- initial_seed – initial seed for - torch.utils.data.DataLoaderworkers
- get_datasets_fn (Callable) – callable function to get dictionary with - torch.utils.data.Datasets
- **data_params – additional data parameters or dictionary with - torch.utils.data.Datasetsto use for pytorch dataloaders creation
 
- Returns
- dictionary with
- torch.utils.data.DataLoader
 
- Return type
- OrderedDict[str, DataLoader] 
- Raises
- NotImplementedError – if datasource is out of Dataset or dict 
- ValueError – if batch_sampler option is mutually exclusive with distributed 
 
 
- 
catalyst.utils.loaders.validate_loaders(loaders: Dict[str, torch.utils.data.dataloader.DataLoader]) → Dict[str, torch.utils.data.dataloader.DataLoader][source]
- Check pytorch dataloaders for distributed setup. Transfers them to distirbuted mode if necessary. (Experimental feature) - Parameters
- loaders (Dict[str, DataLoader]) – dictionery with pytorch dataloaders 
- Returns
- dictionery
- with pytorch dataloaders (with distributed samplers if necessary) 
 
- Return type
- Dict[str, DataLoader] 
 
Misc¶
- 
catalyst.utils.misc.copy_directory(input_dir: pathlib.Path, output_dir: pathlib.Path) → None[source]¶
- Recursively copies the input directory. - Parameters
- input_dir – input directory 
- output_dir – output directory 
 
 
- 
catalyst.utils.misc.format_metric(name: str, value: float) → str[source]¶
- Format metric. - Metric will be returned in the scientific format if 4 decimal chars are not enough (metric value lower than 1e-4). - Parameters
- name – metric name 
- value – value of metric 
 
- Returns
- formatted metric 
- Return type
- str 
 
- 
catalyst.utils.misc.get_fn_default_params(fn: Callable[[...], Any], exclude: List[str] = None)[source]¶
- Return default parameters of Callable. - Parameters
- fn (Callable[.., Any]) – target Callable 
- exclude – exclude list of parameters 
 
- Returns
- contains default parameters of fn 
- Return type
- dict 
 
- 
catalyst.utils.misc.get_fn_argsnames(fn: Callable[[...], Any], exclude: List[str] = None)[source]¶
- Return parameter names of Callable. - Parameters
- fn (Callable[.., Any]) – target Callable 
- exclude – exclude list of parameters 
 
- Returns
- contains parameter names of fn 
- Return type
- list 
 
- 
catalyst.utils.misc.get_utcnow_time(format: str = None) → str[source]¶
- Return string with current utc time in chosen format. - Parameters
- format – format string. if None “%y%m%d.%H%M%S” will be used. 
- Returns
- formatted utc time string 
- Return type
- str 
 
- 
catalyst.utils.misc.is_exception(ex: Any) → bool[source]¶
- Check if the argument is of - Exceptiontype.
- 
catalyst.utils.misc.maybe_recursive_call(object_or_dict, method: Union[str, Callable], recursive_args=None, recursive_kwargs=None, **kwargs)[source]¶
- Calls the - methodrecursively for the- object_or_dict.- Parameters
- object_or_dict – some object or a dictionary of objects 
- method – method name to call 
- recursive_args – list of arguments to pass to the - method
- recursive_kwargs – list of key-arguments to pass to the - method
- **kwargs – Arbitrary keyword arguments 
 
- Returns
- result of method call 
 
- 
catalyst.utils.misc.get_attr(obj: Any, key: str, inner_key: str = None) → Any[source]¶
- Alias for python getattr method. Useful for Callbacks preparation and cases with multi-criterion, multi-optimizer setup. For example, when you would like to train multi-task classification. - Used to get a named attribute from a IRunner by key keyword; for example - # example 1 runner.get_attr("criterion") # is equivalent to runner.criterion # example 2 runner.get_attr("optimizer") # is equivalent to runner.optimizer # example 3 runner.get_attr("scheduler") # is equivalent to runner.scheduler - With inner_key usage, it suppose to find a dictionary under key and would get inner_key from this dict; for example, - # example 1 runner.get_attr("criterion", "bce") # is equivalent to runner.criterion["bce"] # example 2 runner.get_attr("optimizer", "adam") # is equivalent to runner.optimizer["adam"] # example 3 runner.get_attr("scheduler", "adam") # is equivalent to runner.scheduler["adam"] - Parameters
- obj – object of interest 
- key – name for attribute of interest, like criterion, optimizer, scheduler 
- inner_key – name of inner dictionary key 
 
- Returns
- inner attribute 
 
Numpy¶
- 
catalyst.utils.numpy.get_one_hot(label: int, num_classes: int, smoothing: float = None) → numpy.ndarray[source]¶
- Applies OneHot vectorization to a giving scalar, optional with label smoothing as described in Bag of Tricks for Image Classification with Convolutional Neural Networks. - Parameters
- label – scalar value to be vectorized 
- num_classes – total number of classes 
- smoothing (float, optional) – if specified applies label smoothing from - Bag of Tricks for Image Classification with Convolutional Neural Networkspaper
 
- Returns
- a one-hot vector with shape - (num_classes,)
- Return type
- np.ndarray 
 
Parser¶
Pipelines¶
- 
catalyst.utils.pipelines.clone_pipeline(template: str, out_dir: pathlib.Path) → None[source]¶
- Clones pipeline from empty pipeline template or from demo pipelines available in Git repos of Catalyst Team. - Parameters
- template – type of pipeline you want to clone. empty/classification/segmentation 
- out_dir – path where pipeline directory should be cloned 
 
 
Pruning¶
- 
catalyst.utils.pruning.prune_model(model: torch.nn.modules.module.Module, pruning_fn: Callable, keys_to_prune: List[str], amount: Union[float, int], layers_to_prune: Optional[List[str]] = None, reinitialize_after_pruning: Optional[bool] = False) → None[source]¶
- Prune model function can be used for pruning certain tensors in model layers. - Raises
- AttributeError – If layers_to_prune is not None, but there is no layers with specified name. 
- Exception – If no layers have specified keys. 
 
- Parameters
- model – Model to be pruned. 
- pruning_fn – Pruning function with API same as in torch.nn.utils.pruning. pruning_fn(module, name, amount). 
- keys_to_prune – list of strings. Determines which tensor in modules will be pruned. 
- amount – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune. 
- layers_to_prune – list of strings - module names to be pruned. If None provided then will try to prune every module in model. 
- reinitialize_after_pruning – if True then will reinitialize model after pruning. (Lottery Ticket Hypothesis check e.g.) 
 
 
- 
catalyst.utils.pruning.remove_reparametrization(model: torch.nn.modules.module.Module, keys_to_prune: List[str], layers_to_prune: Optional[List[str]] = None) → None[source]¶
- Removes pre-hooks and pruning masks from the model. - Parameters
- model – model to remove reparametrization. 
- keys_to_prune – list of strings. Determines which tensor in modules have already been pruned. 
- layers_to_prune – list of strings - module names have already been pruned. If None provided then will try to prune every module in model. 
 
 
Quantization¶
- 
catalyst.utils.quantization.quantize_model_from_checkpoint(logdir: pathlib.Path, checkpoint_name: str, stage: str = None, qconfig_spec: Union[Set, Dict, None] = None, dtype: Optional[torch.dtype] = torch.qint8, backend: str = None) → torch.nn.modules.module.Module[source]¶
- Quantize model using created experiment and runner. - Parameters
- logdir (Union[str, Path]) – Path to Catalyst logdir with model 
- checkpoint_name – Name of model checkpoint to use 
- stage – experiment’s stage name 
- qconfig_spec – torch.quantization.quantize_dynamic parameter, you can define layers to be quantize 
- dtype – type of the model parameters, default int8 
- backend – defines backend for quantization 
 
- Returns
- Quantized model 
 
- 
catalyst.utils.quantization.save_quantized_model(model: torch.nn.modules.module.Module, logdir: Union[str, pathlib.Path] = None, checkpoint_name: str = None, out_dir: Union[str, pathlib.Path] = None, out_model: Union[str, pathlib.Path] = None) → None[source]¶
- Saves quantized model. - Parameters
- model – Traced model 
- logdir (Union[str, Path]) – Path to experiment 
- checkpoint_name – name for the checkpoint 
- out_dir (Union[str, Path]) – Directory to save model to (overrides logdir) 
- out_model (Union[str, Path]) – Path to save model to (overrides logdir & out_dir) 
 
- Raises
- ValueError – if nothing out of logdir, out_dir or out_model is specified. 
 
Scripts¶
- 
catalyst.utils.scripts.import_module(expdir: Union[str, pathlib.Path])[source]¶
- Imports python module by path. - Parameters
- expdir – path to python module. 
- Returns
- Imported module. 
 
- 
catalyst.utils.scripts.dump_code(expdir: Union[str, pathlib.Path], logdir: Union[str, pathlib.Path]) → None[source]¶
- Dumps Catalyst code for reproducibility. - Parameters
- expdir (Union[str, pathlib.Path]) – experiment dir path 
- logdir (Union[str, pathlib.Path]) – logging dir path 
 
 
- 
catalyst.utils.scripts.dump_python_files(src: pathlib.Path, dst: pathlib.Path) → None[source]¶
- Dumps python code ( - *.pyand- *.ipynb) files.- Parameters
- src – source code path 
- dst – destination code path 
 
 
- 
catalyst.utils.scripts.prepare_config_api_components(expdir: pathlib.Path, config: Dict)[source]¶
- Imports and create core Config API components - Experiment, Runner and Config from - expdir- experiment directory and- config- experiment config.- Parameters
- expdir – experiment directory path 
- config – dictionary with experiment Config 
 
- Returns
- Experiment, Runner, Config for Config API usage. 
 
- 
catalyst.utils.scripts.dump_experiment_code(src: pathlib.Path, dst: pathlib.Path) → None[source]¶
- Dumps your experiment code for Config API use cases. - Parameters
- src – source code path 
- dst – destination code path 
 
 
- 
catalyst.utils.scripts.distributed_cmd_run(worker_fn: Callable, distributed: bool = True, *args, **kwargs) → None[source]¶
- Distributed run - Parameters
- worker_fn – worker fn to run in distributed mode 
- distributed – distributed flag 
- args – additional parameters for worker_fn 
- kwargs – additional key-value parameters for worker_fn 
 
 
Stochastic Weights Averaging (SWA)¶
- 
catalyst.utils.swa.average_weights(state_dicts: List[dict]) → collections.OrderedDict[source]¶
- Averaging of input weights. - Parameters
- state_dicts – Weights to average 
- Raises
- KeyError – If states do not match 
- Returns
- Averaged weights 
 
- 
catalyst.utils.swa.get_averaged_weights_by_path_mask(path_mask: str, logdir: Union[str, pathlib.Path] = None) → collections.OrderedDict[source]¶
- Averaging of input weights and saving them. - Parameters
- path_mask – globe-like pattern for models to average 
- logdir – Path to logs directory 
 
- Returns
- Averaged weights 
 
Sys¶
- 
catalyst.utils.sys.get_environment_vars() → Dict[str, Any][source]¶
- Creates a dictionary with environment variables. - Returns
- environment variables 
- Return type
- Dict 
 
- 
catalyst.utils.sys.list_conda_packages() → str[source]¶
- Lists conda installed packages. - Returns
- list with conda installed packages 
- Return type
- str 
 
- 
catalyst.utils.sys.list_pip_packages() → str[source]¶
- Lists pip installed packages. - Returns
- string with pip installed packages 
- Return type
- str 
 
- 
catalyst.utils.sys.dump_environment(experiment_config: Dict, logdir: str, configs_path: List[str] = None) → None[source]¶
- Saves config, environment variables and package list in JSON into logdir. - Parameters
- experiment_config – experiment config 
- logdir – path to logdir 
- configs_path – path(s) to config 
 
 
Torch¶
- 
catalyst.utils.torch.get_optimizable_params(model_or_params)[source]¶
- Returns all the parameters that requires gradients. 
- 
catalyst.utils.torch.get_optimizer_momentum(optimizer: torch.optim.optimizer.Optimizer) → float[source]¶
- Get momentum of current optimizer. - Parameters
- optimizer – PyTorch optimizer 
- Returns
- momentum at first param group 
- Return type
- float 
 
- 
catalyst.utils.torch.set_optimizer_momentum(optimizer: torch.optim.optimizer.Optimizer, value: float, index: int = 0)[source]¶
- Set momentum of - index‘th param group of optimizer to- value.- Parameters
- optimizer – PyTorch optimizer 
- value – new value of momentum 
- index (int, optional) – integer index of optimizer’s param groups, default is 0 
 
 
- 
catalyst.utils.torch.get_device() → torch.device[source]¶
- Simple returning the best available device (TPU > GPU > CPU). 
- 
catalyst.utils.torch.get_available_gpus()[source]¶
- Array of available GPU ids. - Examples - >>> os.environ["CUDA_VISIBLE_DEVICES"] = "0,2" >>> get_available_gpus() [0, 2] - >>> os.environ["CUDA_VISIBLE_DEVICES"] = "0,-1,1" >>> get_available_gpus() [0] - >>> os.environ["CUDA_VISIBLE_DEVICES"] = "" >>> get_available_gpus() [] - >>> os.environ["CUDA_VISIBLE_DEVICES"] = "-1" >>> get_available_gpus() [] - Returns
- available GPU ids 
- Return type
- iterable 
 
- 
catalyst.utils.torch.get_activation_fn(activation: str = None)[source]¶
- Returns the activation function from - torch.nnby its name.
- 
catalyst.utils.torch.any2device(value, device: Union[str, torch.device])[source]¶
- Move tensor, list of tensors, list of list of tensors, dict of tensors, tuple of tensors to target device. - Parameters
- value – Object to be moved 
- device – target device ids 
 
- Returns
- Same structure as value, but all tensors and np.arrays moved to device 
 
- 
catalyst.utils.torch.prepare_cudnn(deterministic: bool = None, benchmark: bool = None) → None[source]¶
- Prepares CuDNN benchmark and sets CuDNN to be deterministic/non-deterministic mode - Parameters
- deterministic – deterministic mode if running in CuDNN backend. 
- benchmark – If - Trueuse CuDNN heuristics to figure out which algorithm will be most performant for your model architecture and input. Setting it to- Falsemay slow down your training.
 
 
- 
catalyst.utils.torch.process_model_params(model: torch.nn.modules.module.Module, layerwise_params: Dict[str, dict] = None, no_bias_weight_decay: bool = True, lr_scaling: float = 1.0) → List[Union[torch.nn.parameter.Parameter, dict]][source]¶
- Gains model parameters for - torch.optim.Optimizer.- Parameters
- model – Model to process 
- layerwise_params – Order-sensitive dict where each key is regex pattern and values are layer-wise options for layers matching with a pattern 
- no_bias_weight_decay – If true, removes weight_decay for all - biasparameters in the model
- lr_scaling – layer-wise learning rate scaling, if 1.0, learning rates will not be scaled 
 
- Returns
- parameters for an optimizer 
- Return type
- iterable 
 - Example: - >>> model = catalyst.contrib.models.segmentation.ResnetUnet() >>> layerwise_params = collections.OrderedDict([ >>> ("conv1.*", dict(lr=0.001, weight_decay=0.0003)), >>> ("conv.*", dict(lr=0.002)) >>> ]) >>> params = process_model_params(model, layerwise_params) >>> optimizer = torch.optim.Adam(params, lr=0.0003) 
- 
catalyst.utils.torch.get_requires_grad(model: torch.nn.modules.module.Module)[source]¶
- Gets the - requires_gradvalue for all model parameters.- Example: - >>> model = SimpleModel() >>> requires_grad = get_requires_grad(model) - Parameters
- model – model 
- Returns
- value 
- Return type
- requires_grad (Dict[str, bool]) 
 
- 
catalyst.utils.torch.set_requires_grad(model: torch.nn.modules.module.Module, requires_grad: Union[bool, Dict[str, bool]])[source]¶
- Sets the - requires_gradvalue for all model parameters.- Example: - >>> model = SimpleModel() >>> set_requires_grad(model, requires_grad=True) >>> # or >>> model = SimpleModel() >>> set_requires_grad(model, requires_grad={""}) - Parameters
- model – model 
- requires_grad (Union[bool, Dict[str, bool]]) – value 
 
 
- 
catalyst.utils.torch.get_network_output(net: torch.nn.modules.module.Module, *input_shapes_args, **input_shapes_kwargs)[source]¶
- # noqa: D202 For each input shape returns an output tensor - Examples - >>> net = nn.Linear(10, 5) >>> utils.get_network_output(net, (1, 10)) tensor([[[-0.2665, 0.5792, 0.9757, -0.5782, 0.1530]]]) - Parameters
- net – the model 
- *input_shapes_args – variable length argument list of shapes 
- **input_shapes_kwargs – key-value arguemnts of shapes 
 
- Returns
- tensor with network output 
 
- 
catalyst.utils.torch.detach(tensor: torch.Tensor) → numpy.ndarray[source]¶
- Detach a pytorch tensor from graph and convert it to numpy array - Parameters
- tensor – PyTorch tensor 
- Returns
- numpy ndarray 
 
- 
catalyst.utils.torch.trim_tensors(tensors)[source]¶
- Trim padding off of a batch of tensors to the smallest possible length. Should be used with catalyst.data.DynamicLenBatchSampler. - Adapted from Dynamic minibatch trimming to improve BERT training speed. - Parameters
- tensors – list of tensors to trim. 
- Returns
- list of trimmed tensors. 
- Return type
- List[torch.tensor] 
 
Tracing¶
- 
catalyst.utils.tracing.trace_model(model: torch.nn.modules.module.Module, predict_fn: Callable, batch=None, method_name: str = 'forward', mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, device: Union[str, torch.device] = 'cpu', predict_params: dict = None) → torch.jit._script.ScriptModule[source]¶
- Traces model using runner and batch. - Parameters
- model – Model to trace 
- predict_fn – Function to run prediction with the model provided, takes model, inputs parameters 
- batch – Batch to trace the model 
- method_name – Model’s method name that will be used as entrypoint during tracing 
- mode – Mode for model to trace ( - trainor- eval)
- requires_grad – Flag to use grads 
- opt_level – Apex FP16 init level, optional 
- device – Torch device 
- predict_params – additional parameters for model forward 
 
- Returns
- Traced model 
- Return type
- jit.ScriptModule 
- Raises
- ValueError – if both batch and predict_fn must be specified or mode is not in ‘eval’ or ‘train’. 
 
- 
catalyst.utils.tracing.trace_model_from_checkpoint(logdir: pathlib.Path, method_name: str, checkpoint_name: str, stage: str = None, loader: Union[str, int] = None, mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, device: Union[str, torch.device] = 'cpu')[source]¶
- Traces model using created experiment and runner. - Parameters
- logdir (Union[str, Path]) – Path to Catalyst logdir with model 
- checkpoint_name – Name of model checkpoint to use 
- stage – experiment’s stage name 
- loader (Union[str, int]) – experiment’s loader name or its index 
- method_name – Model’s method name that will be used as entrypoint during tracing 
- mode – Mode for model to trace ( - trainor- eval)
- requires_grad – Flag to use grads 
- opt_level – AMP FP16 init level 
- device – Torch device 
 
- Returns
- the traced model 
 
- 
catalyst.utils.tracing.trace_model_from_runner(runner: IRunner, checkpoint_name: str = None, method_name: str = 'forward', mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, device: Union[str, torch.device] = 'cpu') → torch.jit._script.ScriptModule[source]¶
- Traces model using created experiment and runner. - Parameters
- runner – current runner. 
- checkpoint_name – Name of model checkpoint to use, if None traces current model from runner 
- method_name – Model’s method name that will be used as entrypoint during tracing 
- mode – Mode for model to trace ( - trainor- eval)
- requires_grad – Flag to use grads 
- opt_level – AMP FP16 init level 
- device – Torch device 
 
- Returns
- Traced model 
- Return type
- ScriptModule 
 
- 
catalyst.utils.tracing.get_trace_name(method_name: str, mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, additional_string: str = None) → str[source]¶
- Creates a file name for the traced model. - Parameters
- method_name – model’s method name 
- mode – - trainor- eval
- requires_grad – flag if model was traced with gradients 
- opt_level – opt_level if model was traced in FP16 
- additional_string – any additional information 
 
- Returns
- Filename for traced model to be saved. 
- Return type
- str 
 
- 
catalyst.utils.tracing.save_traced_model(model: torch.jit._script.ScriptModule, logdir: Union[str, pathlib.Path] = None, method_name: str = 'forward', mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, out_dir: Union[str, pathlib.Path] = None, out_model: Union[str, pathlib.Path] = None, checkpoint_name: str = None) → None[source]¶
- Saves traced model. - Parameters
- model – Traced model 
- logdir (Union[str, Path]) – Path to experiment 
- method_name – Name of the method was traced 
- mode – Model’s mode - train or eval 
- requires_grad – Whether model was traced with require_grad or not 
- opt_level – Apex FP16 init level used during tracing 
- out_dir (Union[str, Path]) – Directory to save model to (overrides logdir) 
- out_model (Union[str, Path]) – Path to save model to (overrides logdir & out_dir) 
- checkpoint_name – Checkpoint name used to restore the model 
 
- Raises
- ValueError – if nothing out of logdir, out_dir or out_model is specified. 
 
- 
catalyst.utils.tracing.load_traced_model(model_path: Union[str, pathlib.Path], device: Union[str, torch.device] = 'cpu', opt_level: str = None) → torch.jit._script.ScriptModule[source]¶
- Loads a traced model. - Parameters
- model_path – Path to traced model 
- device – Torch device 
- opt_level – Apex FP16 init level, optional 
 
- Returns
- Traced model 
- Return type
- ScriptModule 
 
Wizard¶
- 
class catalyst.utils.wizard.Wizard[source]¶
- Bases: - object- Class for Catalyst Config API Wizard. - The instance of this class will be created and called from cli command: - catalyst-dl init --interactive.- With help of this Wizard user will be able to setup pipeline from available templates and make choices of what predefined classes to use in different parts of pipeline. 
Contrib¶
Argparse¶
- 
catalyst.contrib.utils.argparse.boolean_flag(parser: argparse.ArgumentParser, name: str, default: Optional[bool] = False, help: str = None, shorthand: str = None) → None[source]¶
- Add a boolean flag to a parser inplace. - Examples - >>> parser = argparse.ArgumentParser() >>> boolean_flag( >>> parser, "flag", default=False, help="some flag", shorthand="f" >>> ) - Parameters
- parser – parser to add the flag to 
- name – argument name –<name> will enable the flag, while –no-<name> will disable it 
- default (bool, optional) – default value of the flag 
- help – help string for the flag 
- shorthand – shorthand string for the argument 
 
 
Compression¶
- 
catalyst.contrib.utils.compression.pack(data)¶
- Serialize the data into bytes using pickle. - Parameters
- data – a value 
- Returns
- Returns a bytes object serialized with pickle data. 
 
- 
catalyst.contrib.utils.compression.pack_if_needed(data)¶
- Serialize the data into bytes using pickle. - Parameters
- data – a value 
- Returns
- Returns a bytes object serialized with pickle data. 
 
- 
catalyst.contrib.utils.compression.unpack(bytes)¶
- Deserialize bytes into an object using pickle. - Parameters
- bytes – a bytes object containing serialized with pickle data. 
- Returns
- Returns a value deserialized from the bytes-like object. 
 
- 
catalyst.contrib.utils.compression.unpack_if_needed(bytes)¶
- Deserialize bytes into an object using pickle. - Parameters
- bytes – a bytes object containing serialized with pickle data. 
- Returns
- Returns a value deserialized from the bytes-like object. 
 
Confusion Matrix¶
- 
catalyst.contrib.utils.confusion_matrix.calculate_tp_fp_fn(confusion_matrix: numpy.ndarray) → numpy.ndarray[source]¶
- @TODO: Docs. Contribution is welcome. 
- 
catalyst.contrib.utils.confusion_matrix.calculate_confusion_matrix_from_arrays(predictions: numpy.ndarray, labels: numpy.ndarray, num_classes: int) → numpy.ndarray[source]¶
- Calculate confusion matrix for a given set of classes. If labels value is outside of the [0, num_classes) it is excluded. - Parameters
- predictions – model predictions 
- labels – ground truth labels 
- num_classes – number of classes 
 
- Returns
- confusion matrix 
- Return type
- np.ndarray 
 
- 
catalyst.contrib.utils.confusion_matrix.calculate_confusion_matrix_from_tensors(y_pred_logits: torch.Tensor, y_true: torch.Tensor) → numpy.ndarray[source]¶
- Calculate confusion matrix from tensors. - Parameters
- y_pred_logits – model logits 
- y_true – true labels 
 
- Returns
- confusion matrix 
- Return type
- np.ndarray 
 
Dataset¶
- 
catalyst.contrib.utils.dataset.create_dataset(dirs: str, extension: str = None, process_fn: Callable[[str], object] = None, recursive: bool = False) → Dict[str, object][source]¶
- Create dataset (dict like {key: [values]}) from vctk-like dataset: - dataset/ cat/ *.ext dog/ *.ext - Parameters
- dirs – path to dirs, for example /home/user/data/** 
- extension – data extension you are looking for 
- process_fn (Callable[[str], object]) – function(path_to_file) -> object process function for found files, by default 
- recursive – enables recursive globbing 
 
- Returns
- dataset 
- Return type
- dict 
 
- 
catalyst.contrib.utils.dataset.create_dataframe(dataset: Dict[str, object], **dataframe_args) → pandas.core.frame.DataFrame[source]¶
- Create pd.DataFrame from dict like {key: [values]}. - Parameters
- dataset – dict like {key: [values]} 
- **dataframe_args – - indexIndex or array-like
- Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided 
- columnsIndex or array-like
- Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided 
- dtypedtype, default None
- Data type to force, otherwise infer 
 
 
- Returns
- dataframe from giving dataset 
- Return type
- pd.DataFrame 
 
- 
catalyst.contrib.utils.dataset.split_dataset_train_test(dataset: pandas.core.frame.DataFrame, **train_test_split_args) → Tuple[Dict[str, object], Dict[str, object]][source]¶
- Split dataset in train and test parts. - Parameters
- dataset – dict like dataset 
- **train_test_split_args – - test_sizefloat, int, or None (default is None)
- If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is automatically set to the complement of the train size. If train size is also None, test size is set to 0.25. 
- train_sizefloat, int, or None (default is None)
- If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size. 
- random_stateint or RandomState
- Pseudo-random number generator state used for random sampling. 
- stratifyarray-like or None (default is None)
- If not None, data is split in a stratified fashion, using this as the class labels. 
 
 
- Returns
- train and test dicts 
 
Misc¶
- 
catalyst.contrib.utils.misc.args_are_not_none(*args: Optional[Any]) → bool[source]¶
- Check that all arguments are not - None.- Parameters
- *args – values # noqa: RST213 
- Returns
- True if all value were not None, False otherwise 
- Return type
- bool 
 
- 
catalyst.contrib.utils.misc.make_tuple(tuple_like)[source]¶
- Creates a tuple if given - tuple_likevalue isn’t list or tuple.- Parameters
- tuple_like – tuple like object - list or tuple 
- Returns
- tuple or list 
 
Pandas¶
- 
catalyst.contrib.utils.pandas.dataframe_to_list(dataframe: pandas.core.frame.DataFrame) → List[dict][source]¶
- Converts dataframe to a list of rows (without indexes). - Parameters
- dataframe – input dataframe 
- Returns
- list of rows 
- Return type
- List[dict] 
 
- 
catalyst.contrib.utils.pandas.folds_to_list(folds: Union[list, str, pandas.core.series.Series]) → List[int][source]¶
- This function formats string or either list of numbers into a list of unique int. - Examples - >>> folds_to_list("1,2,1,3,4,2,4,6") [1, 2, 3, 4, 6] >>> folds_to_list([1, 2, 3.0, 5]) [1, 2, 3, 5] - Parameters
- folds (Union[list, str, pd.Series]) – Either list of numbers or one string with numbers separated by commas or pandas series 
- Returns
- list of unique ints 
- Return type
- List[int] 
- Raises
- ValueError – if value in string or array cannot be casted to int 
 
- 
catalyst.contrib.utils.pandas.split_dataframe(dataframe: pandas.core.frame.DataFrame, train_folds: List[int], valid_folds: Optional[List[int]] = None, infer_folds: Optional[List[int]] = None, tag2class: Optional[Dict[str, int]] = None, tag_column: str = None, class_column: str = None, seed: int = 42, n_folds: int = 5) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]¶
- Split a Pandas DataFrame into folds. - Parameters
- dataframe – input dataframe 
- train_folds – train folds 
- valid_folds (List[int], optional) – valid folds. If none takes all folds not included in - train_folds
- infer_folds (List[int], optional) – infer folds. If none takes all folds not included in - train_foldsand- valid_folds
- tag2class (Dict[str, int], optional) – mapping from label names into int 
- tag_column (str, optional) – column with label names 
- class_column (str, optional) – column to use for split 
- seed – seed for split 
- n_folds – number of folds 
 
- Returns
- tuple with 4 dataframes
- whole dataframe, train part, valid part and infer part 
 
- Return type
- tuple 
 
- 
catalyst.contrib.utils.pandas.split_dataframe_on_column_folds(dataframe: pandas.core.frame.DataFrame, column: str, random_state: int = 42, n_folds: int = 5) → pandas.core.frame.DataFrame[source]¶
- Splits DataFrame into N folds. - Parameters
- dataframe – a dataset 
- column – which column to use 
- random_state – seed for random shuffle 
- n_folds – number of result folds 
 
- Returns
- new dataframe with fold column 
- Return type
- pd.DataFrame 
 
- 
catalyst.contrib.utils.pandas.split_dataframe_on_folds(dataframe: pandas.core.frame.DataFrame, random_state: int = 42, n_folds: int = 5) → pandas.core.frame.DataFrame[source]¶
- Splits DataFrame into N folds. - Parameters
- dataframe – a dataset 
- random_state – seed for random shuffle 
- n_folds – number of result folds 
 
- Returns
- new dataframe with fold column 
- Return type
- pd.DataFrame 
 
- 
catalyst.contrib.utils.pandas.split_dataframe_on_stratified_folds(dataframe: pandas.core.frame.DataFrame, class_column: str, random_state: int = 42, n_folds: int = 5) → pandas.core.frame.DataFrame[source]¶
- Splits DataFrame into N stratified folds. - Also see - catalyst.data.sampler.BalanceClassSampler- Parameters
- dataframe – a dataset 
- class_column – which column to use for split 
- random_state – seed for random shuffle 
- n_folds – number of result folds 
 
- Returns
- new dataframe with fold column 
- Return type
- pd.DataFrame 
 
- 
catalyst.contrib.utils.pandas.split_dataframe_train_test(dataframe: pandas.core.frame.DataFrame, **train_test_split_args) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]¶
- Split dataframe in train and test part. - Parameters
- dataframe – pd.DataFrame to split 
- **train_test_split_args – - test_sizefloat, int, or None (default is None)
- If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is automatically set to the complement of the train size. If train size is also None, test size is set to 0.25. 
- train_sizefloat, int, or None (default is None)
- If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size. 
- random_stateint or RandomState
- Pseudo-random number generator state used for random sampling. 
- stratifyarray-like or None (default is None)
- If not None, data is split in a stratified fashion, using this as the class labels. 
 
 
- Returns
- train and test DataFrames 
 - Note - It exist cause sklearn split is overcomplicated. 
- Separates values in - class_columncolumn.- Parameters
- dataframe – a dataset 
- tag_column – column name to separate values 
- tag_delim – delimiter to separate values 
 
- Returns
- new dataframe 
- Return type
- pd.DataFrame 
 
- 
catalyst.contrib.utils.pandas.read_multiple_dataframes(in_csv_train: str = None, in_csv_valid: str = None, in_csv_infer: str = None, tag2class: Optional[Dict[str, int]] = None, class_column: str = None, tag_column: str = None) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]¶
- This function reads train/valid/infer dataframes from giving paths. - Parameters
- in_csv_train – paths to train csv separated by commas 
- in_csv_valid – paths to valid csv separated by commas 
- in_csv_infer – paths to infer csv separated by commas 
- tag2class (Dict[str, int], optional) – mapping from label names into int 
- tag_column (str, optional) – column with label names 
- class_column (str, optional) – column to use for split 
 
- Returns
- tuple with 4 dataframes
- whole dataframe, train part, valid part and infer part 
 
- Return type
- tuple 
 
- 
catalyst.contrib.utils.pandas.map_dataframe(dataframe: pandas.core.frame.DataFrame, tag_column: str, class_column: str, tag2class: Dict[str, int], verbose: bool = False) → pandas.core.frame.DataFrame[source]¶
- This function maps tags from - tag_columnto ints into- class_columnusing- tag2classdictionary.- Parameters
- dataframe – input dataframe 
- tag_column – column with tags 
- class_column (str) – 
- tag2class (Dict[str, int]) – mapping from tags to class labels 
- verbose – flag if true, uses tqdm 
 
- Returns
- updated dataframe with - class_column
- Return type
- pd.DataFrame 
 
- 
catalyst.contrib.utils.pandas.get_dataset_labeling(dataframe: pandas.core.frame.DataFrame, tag_column: str) → Dict[str, int][source]¶
- Prepares a mapping using unique values from - tag_column.- { "class_name_0": 0, "class_name_1": 1, ... "class_name_N": N } - Parameters
- dataframe – a dataset 
- tag_column – which column to use 
 
- Returns
- mapping from tag to labels 
- Return type
- Dict[str, int] 
 
- 
catalyst.contrib.utils.pandas.merge_multiple_fold_csv(fold_name: str, paths: Optional[str]) → pandas.core.frame.DataFrame[source]¶
- Reads csv into one DataFrame with column - fold.- Parameters
- fold_name – current fold name 
- paths – paths to csv separated by commas 
 
- Returns
- merged dataframes with column - fold==- fold_name
- Return type
- pd.DataFrame 
 
- 
catalyst.contrib.utils.pandas.read_csv_data(in_csv: str = None, train_folds: Optional[List[int]] = None, valid_folds: Optional[List[int]] = None, infer_folds: Optional[List[int]] = None, seed: int = 42, n_folds: int = 5, in_csv_train: str = None, in_csv_valid: str = None, in_csv_infer: str = None, tag2class: Optional[Dict[str, int]] = None, class_column: str = None, tag_column: str = None) → Tuple[pandas.core.frame.DataFrame, List[dict], List[dict], List[dict]][source]¶
- From giving path - in_csvreads a dataframe and split it to train/valid/infer folds or from several paths- in_csv_train,- in_csv_valid,- in_csv_inferreads independent folds.- Note - This function can be used with different combinations of params.
- First block is used to get dataset from one csv:
- in_csv, train_folds, valid_folds, infer_folds, seed, n_folds 
- Second includes paths to different csv for train/valid and infer parts:
- in_csv_train, in_csv_valid, in_csv_infer 
- The other params (tag2class, tag_column, class_column) are optional
- for any previous block 
 
 - Parameters
- in_csv – paths to whole dataset 
- train_folds – train folds 
- valid_folds (List[int], optional) – valid folds. If none takes all folds not included in - train_folds
- infer_folds (List[int], optional) – infer folds. If none takes all folds not included in - train_foldsand- valid_folds
- seed – seed for split 
- n_folds – number of folds 
- in_csv_train – paths to train csv separated by commas 
- in_csv_valid – paths to valid csv separated by commas 
- in_csv_infer – paths to infer csv separated by commas 
- tag2class (Dict[str, int]) – mapping from label names into ints 
- tag_column – column with label names 
- class_column – column to use for split 
 
- Returns
- tuple with 4 elements (whole dataframe, list with train data, list with valid data and list with infer data) 
- Return type
- Tuple[pd.DataFrame, List[dict], List[dict], List[dict]] 
 
- 
catalyst.contrib.utils.pandas.balance_classes(dataframe: pandas.core.frame.DataFrame, class_column: str = 'label', random_state: int = 42, how: str = 'downsampling') → pandas.core.frame.DataFrame[source]¶
- Balance classes in dataframe by - class_column.- See also - catalyst.data.sampler.BalanceClassSampler.- Parameters
- dataframe – a dataset 
- class_column – which column to use for split 
- random_state – seed for random shuffle 
- how – strategy to sample, must be one on [“downsampling”, “upsampling”] 
 
- Returns
- new dataframe with balanced - class_column
- Return type
- pd.DataFrame 
- Raises
- NotImplementedError – if how is not in [“upsampling”, “downsampling”, int] 
 
Parallel¶
- 
catalyst.contrib.utils.parallel.parallel_imap(func, args, pool: Union[multiprocessing.pool.Pool, catalyst.contrib.utils.parallel.DumbPool]) → List[T][source]¶
- @TODO: Docs. Contribution is welcome. 
Plotly¶
- 
catalyst.contrib.utils.plotly.plot_tensorboard_log(logdir: Union[str, pathlib.Path], step: Optional[str] = 'batch', metrics: Optional[List[str]] = None, height: Optional[int] = None, width: Optional[int] = None) → None[source]¶
- @TODO: Docs. Contribution is welcome. - Adapted from https://github.com/belskikh/kekas/blob/v0.1.23/kekas/utils.py#L193 
- 
catalyst.contrib.utils.plotly.plot_metrics(logdir: Union[str, pathlib.Path], step: Optional[str] = 'epoch', metrics: Optional[List[str]] = None, height: Optional[int] = None, width: Optional[int] = None) → None[source]¶
- Plots your learning results. - Parameters
- logdir – the logdir that was specified during training. 
- step – ‘batch’ or ‘epoch’ - what logs to show: for batches or for epochs 
- metrics – list of metrics to plot. The loss should be specified as ‘loss’, learning rate = ‘_base/lr’ and other metrics should be specified as names in metrics dict that was specified during training 
- height – the height of the whole resulting plot 
- width – the width of the whole resulting plot 
 
 
Serialization¶
- 
catalyst.contrib.utils.serialization.serialize(data)¶
- Serialize the data into bytes using pickle. - Parameters
- data – a value 
- Returns
- Returns a bytes object serialized with pickle data. 
 
- 
catalyst.contrib.utils.serialization.deserialize(bytes)¶
- Deserialize bytes into an object using pickle. - Parameters
- bytes – a bytes object containing serialized with pickle data. 
- Returns
- Returns a value deserialized from the bytes-like object. 
 
Visualization¶
- 
catalyst.contrib.utils.visualization.plot_confusion_matrix(cm, class_names=None, normalize=False, title='confusion matrix', fname=None, show=True, figsize=12, fontsize=32, colormap='Blues')[source]¶
- Render the confusion matrix and return matplotlib”s figure with it. Normalization can be applied by setting normalize=True. 
Computer Vision¶
Image¶
- 
catalyst.contrib.utils.cv.image.has_image_extension(uri) → bool[source]¶
- Check that file has image extension. - Parameters
- uri (Union[str, pathlib.Path]) – the resource to load the file from 
- Returns
- True if file has image extension, False otherwise 
- Return type
- bool 
 
- 
catalyst.contrib.utils.cv.image.imread(uri, grayscale: bool = False, expand_dims: bool = True, rootpath: Union[str, pathlib.Path] = None, **kwargs) → numpy.ndarray[source]¶
- Reads an image from the specified file. - Parameters
- uri (str, pathlib.Path, bytes, file) – the resource to load the image from, e.g. a filename, - pathlib.Path, http address or file object, see- imageio.imreaddocs for more info
- grayscale – if True, make all images grayscale 
- expand_dims – if True, append channel axis to grayscale images rootpath (Union[str, pathlib.Path]): path to the resource with image (allows to use relative path) 
- rootpath (Union[str, pathlib.Path]) – path to the resource with image (allows to use relative path) 
- **kwargs – extra params for image read 
 
- Returns
- image 
- Return type
- np.ndarray 
 
- 
catalyst.contrib.utils.cv.image.imwrite(**kwargs)[source]¶
- imwrite(uri, im, format=None, **kwargs)- Write an image to the specified file. Alias for - imageio.imwrite.- Parameters
- **kwargs – parameters for - imageio.imwrite
- Returns
- image save result 
 
- 
catalyst.contrib.utils.cv.image.imsave(**kwargs)[source]¶
- imwrite(uri, im, format=None, **kwargs)- Write an image to the specified file. Alias for - imageio.imsave.- Parameters
- **kwargs – parameters for - imageio.imsave
- Returns
- image save result 
 
- 
catalyst.contrib.utils.cv.image.mask_to_overlay_image(image: numpy.ndarray, masks: List[numpy.ndarray], threshold: float = 0, mask_strength: float = 0.5) → numpy.ndarray[source]¶
- Draws every mask for with some color over image. - Parameters
- image – RGB image used as underlay for masks 
- masks – list of masks 
- threshold – threshold for masks binarization 
- mask_strength – opacity of colorized masks 
 
- Returns
- HxWx3 image with overlay 
- Return type
- np.ndarray 
 
- 
catalyst.contrib.utils.cv.image.mimread(uri, clip_range: Tuple[int, int] = None, expand_dims: bool = True, rootpath: Union[str, pathlib.Path] = None, **kwargs) → numpy.ndarray[source]¶
- Reads multiple images from the specified file. - Parameters
- uri (str, pathlib.Path, bytes, file) – the resource to load the image from, e.g. a filename, - pathlib.Path, http address or file object, see- imageio.mimreaddocs for more info
- clip_range (Tuple[int, int]) – lower and upper interval edges, image values outside the interval are clipped to the interval edges 
- expand_dims – if True, append channel axis to grayscale images rootpath (Union[str, pathlib.Path]): path to the resource with image (allows to use relative path) 
- rootpath (Union[str, pathlib.Path]) – path to the resource with image (allows to use relative path) 
- **kwargs – extra params for image read 
 
- Returns
- image 
- Return type
- np.ndarray 
 
Tensor¶
- 
catalyst.contrib.utils.cv.tensor.tensor_from_rgb_image(image: numpy.ndarray) → torch.Tensor[source]¶
- @TODO: Docs. Contribution is welcome. 
- 
catalyst.contrib.utils.cv.tensor.tensor_to_ndimage(images: torch.Tensor, denormalize: bool = True, mean: Tuple[float, float, float] = (0.485, 0.456, 0.406), std: Tuple[float, float, float] = (0.229, 0.224, 0.225), move_channels_dim: bool = True, dtype=<class 'numpy.float32'>) → numpy.ndarray[source]¶
- Convert float image(s) with standard normalization to np.ndarray with [0..1] when dtype is np.float32 and [0..255] when dtype is np.uint8. - Parameters
- images – [B]xCxHxW float tensor 
- denormalize – if True, multiply image(s) by std and add mean 
- mean (Tuple[float, float, float]) – per channel mean to add 
- std (Tuple[float, float, float]) – per channel std to multiply 
- move_channels_dim – if True, convert tensor to [B]xHxWxC format 
- dtype – result ndarray dtype. Only float32 and uint8 are supported 
 
- Returns
- [B]xHxWxC np.ndarray of dtype 
 
Natural language processing¶
Text¶
- 
catalyst.contrib.utils.nlp.text.tokenize_text(text: str, tokenizer, max_length: int, strip: bool = True, lowercase: bool = True, remove_punctuation: bool = True) → Dict[str, numpy.array][source]¶
- Tokenizes givin text. - Parameters
- text – text to tokenize 
- tokenizer – Tokenizer instance from HuggingFace 
- max_length – maximum length of tokens 
- strip – if true strips text before tokenizing 
- lowercase – if true makes text lowercase before tokenizing 
- remove_punctuation – if true removes - string.punctuationfrom text before tokenizing
 
- Returns
- batch with tokenized text 
 
- 
catalyst.contrib.utils.nlp.text.process_bert_output(bert_output, hidden_size: int, output_hidden_states: bool = False, pooling_groups: List[str] = None, mask: torch.Tensor = None, level: Union[int, str] = None)[source]¶
- Processed BERT output. - Parameters
- bert_output – BERT output 
- hidden_size – hidden size of BERT layers 
- output_hidden_states – boolean flag if we need BERT hidden states 
- pooling_groups – list with pooling to use for sequence embedding 
- mask – boolean flag if we need mask - [PAD]tokens
- level – integer with specified level to use 
 
- Returns
- processed output