Utilities¶
Main¶
All utils are gathered in catalyst.utils
for easier access.
Note
Everything from catalyst.contrib.utils
is included in catalyst.utils
Checkpoint¶
-
catalyst.utils.checkpoint.
pack_checkpoint
(model=None, criterion=None, optimizer=None, scheduler=None, **kwargs)[source]¶ @TODO: Docs. Contribution is welcome.
-
catalyst.utils.checkpoint.
unpack_checkpoint
(checkpoint, model=None, criterion=None, optimizer=None, scheduler=None) → None[source]¶ Load checkpoint from file and unpack the content to a model (if not None), criterion (if not None), optimizer (if not None), scheduler (if not None).
- Parameters
checkpoint – checkpoint to load
model – model where should be updated state
criterion – criterion where should be updated state
optimizer – optimizer where should be updated state
scheduler – scheduler where should be updated state
-
catalyst.utils.checkpoint.
save_checkpoint
(checkpoint: Dict, logdir: Union[pathlib.Path, str], suffix: str, is_best: bool = False, is_last: bool = False, special_suffix: str = '', saver_fn: Callable = <function save>) → Union[pathlib.Path, str][source]¶ Saving checkpoint to a file.
- Parameters
checkpoint – data to save.
logdir – directory where checkpoint should be stored.
suffix – checkpoint file name.
is_best – if
True
then also will be generated best checkpoint file.is_last – if
True
then also will be generated last checkpoint file.special_suffix – suffix to use for saving best/last checkpoints.
saver_fn – function to use for saving data to file, default is
torch.save
- Returns
path to saved checkpoint
Components¶
-
catalyst.utils.components.
process_components
(model: torch.nn.modules.module.Module, criterion: torch.nn.modules.module.Module = None, optimizer: torch.optim.optimizer.Optimizer = None, scheduler: torch.optim.lr_scheduler._LRScheduler = None, distributed_params: Dict = None, device: Union[str, torch.device] = None) → Tuple[torch.nn.modules.module.Module, torch.nn.modules.module.Module, torch.optim.optimizer.Optimizer, torch.optim.lr_scheduler._LRScheduler, Union[str, torch.device]][source]¶ Returns the processed model, criterion, optimizer, scheduler and device.
- Parameters
model – torch model
criterion – criterion function
optimizer – optimizer
scheduler – scheduler
distributed_params (dict, optional) – dict with the parameters for distributed and FP16 method
device (Device, optional) – device
- Returns
tuple with processed model, criterion, optimizer, scheduler and device.
- Raises
ValueError – if device is None and TPU available, for using TPU need to manualy move model/optimizer/scheduler to a TPU device and pass device to a function.
NotImplementedError – if model is not nn.Module or dict for multi-gpu, nn.ModuleDict for DataParallel not implemented yet
Config¶
-
catalyst.utils.config.
load_config
(path: Union[str, pathlib.Path], ordered: bool = False, data_format: str = None, encoding: str = 'utf-8') → Union[Dict, List][source]¶ Loads config by giving path. Supports YAML and JSON files.
Examples
>>> load(path="./config.yml", ordered=True)
- Parameters
path – path to config file (YAML or JSON)
ordered – if true the config will be loaded as
OrderedDict
data_format –
yaml
,yml
orjson
.encoding – encoding to read the config
- Returns
config
- Return type
Union[Dict, List]
- Raises
Exception – if path
path
doesn’t exists or file format is not YAML or JSON
Adapted from https://github.com/TezRomacH/safitty/blob/v1.2.0/safitty/parser.py#L63 which was adapted from https://github.com/catalyst-team/catalyst/blob/v19.03/catalyst/utils/config.py#L10
-
catalyst.utils.config.
save_config
(config: Union[Dict, List], path: Union[str, pathlib.Path], data_format: str = None, encoding: str = 'utf-8', ensure_ascii: bool = False, indent: int = 2) → None[source]¶ Saves config to file. Path must be either YAML or JSON.
- Parameters
config (Union[Dict, List]) – config to save
path (Union[str, Path]) – path to save
data_format –
yaml
,yml
orjson
.encoding – Encoding to write file. Default is
utf-8
ensure_ascii – Used for JSON, if True non-ASCII
are escaped in JSON strings. (characters) –
indent – Used for JSON
Adapted from https://github.com/TezRomacH/safitty/blob/v1.2.0/safitty/parser.py#L110 which was adapted from https://github.com/catalyst-team/catalyst/blob/v19.03/catalyst/utils/config.py#L38
Distributed¶
-
catalyst.utils.distributed.
check_ddp_wrapped
(model: torch.nn.modules.module.Module) → bool[source]¶ Checks whether model is wrapped with DataParallel/DistributedDataParallel.
-
catalyst.utils.distributed.
check_torch_distributed_initialized
() → bool[source]¶ Checks if torch.distributed is available and initialized.
-
catalyst.utils.distributed.
assert_fp16_available
() → None[source]¶ Asserts for installed and available Apex FP16.
-
catalyst.utils.distributed.
initialize_apex
(model, optimizer=None, **distributed_params)[source]¶ @TODO: Docs. Contribution is welcome.
-
catalyst.utils.distributed.
get_nn_from_ddp_module
(model: torch.nn.modules.module.Module) → torch.nn.modules.module.Module[source]¶ Return a real model from a torch.nn.DataParallel, torch.nn.parallel.DistributedDataParallel, or apex.parallel.DistributedDataParallel.
- Parameters
model – A model, or DataParallel wrapper.
- Returns
A model
-
catalyst.utils.distributed.
get_rank
() → int[source]¶ Returns the rank of the current worker.
- Returns
rank
if torch.distributed is initialized, otherwise-1
- Return type
int
-
catalyst.utils.distributed.
get_distributed_mean
(value: Union[float, torch.Tensor])[source]¶ Computes distributed mean among all nodes.
-
catalyst.utils.distributed.
get_distributed_env
(local_rank: int, rank: int, world_size: int, use_cuda_visible_devices: bool = True)[source]¶ Returns environment copy with extra distributed settings.
- Parameters
local_rank – worker local rank
rank – worker global rank
world_size – worker world size
use_cuda_visible_devices – boolean flag to use available GPU devices
- Returns
updated environment copy
Hash¶
Initialization¶
-
catalyst.utils.initialization.
get_optimal_inner_init
(nonlinearity: torch.nn.modules.module.Module, **kwargs) → Callable[[torch.nn.modules.module.Module], None][source]¶ Create initializer for inner layers based on their activation function (nonlinearity).
- Parameters
nonlinearity – non-linear activation
**kwargs – extra kwargs
- Returns
optimal initialization function
- Raises
NotImplementedError – if nonlinearity is out of sigmoid, tanh, relu, `leaky_relu
Loaders¶
-
catalyst.utils.loaders.
get_native_batch_from_loader
(loader: torch.utils.data.dataloader.DataLoader, batch_index: int = 0)[source]¶ Returns a batch from experiment loader
- Parameters
loader – Loader to get batch from
batch_index – Index of batch to take from dataset of the loader
- Returns
batch from loader
-
catalyst.utils.loaders.
get_native_batch_from_loaders
(loaders: Dict[str, torch.utils.data.dataloader.DataLoader], loader: Union[str, int] = 0, batch_index: int = 0)[source]¶ Returns a batch from experiment loaders by its index or name.
- Parameters
loaders (Dict[str, DataLoader]) – Loaders list to get loader from
loader (Union[str, int]) – Loader name or its index, default is zero
batch_index – Index of batch to take from dataset of the loader
- Returns
batch from loader
- Raises
TypeError – if loader parameter is not a string or an integer
-
catalyst.utils.loaders.
get_loader
(data_source: Iterable[dict], open_fn: Callable, dict_transform: Callable = None, sampler=None, collate_fn: Callable = <function default_collate>, batch_size: int = 32, num_workers: int = 4, shuffle: bool = False, drop_last: bool = False)[source]¶ Creates a DataLoader from given source and its open/transform params.
- Parameters
data_source – and iterable containing your data annotations, (for example path to images, labels, bboxes, etc)
open_fn – function, that can open your annotations dict and transfer it to data, needed by your network (for example open image by path, or tokenize read string)
dict_transform – transforms to use on dict (for example normalize image, add blur, crop/resize/etc)
sampler (Sampler, optional) – defines the strategy to draw samples from the dataset
collate_fn (callable, optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset
batch_size (int, optional) – how many samples per batch to load
num_workers (int, optional) – how many subprocesses to use for data loading.
0
means that the data will be loaded in the main processshuffle (bool, optional) – set to
True
to have the data reshuffled at every epoch (default:False
).drop_last (bool, optional) – set to
True
to drop the last incomplete batch, if the dataset size is not divisible by the batch size. IfFalse
and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default:False
)
- Returns
DataLoader with
catalyst.data.ListDataset
-
catalyst.utils.loaders.
validate_loaders
(loaders: Dict[str, torch.utils.data.dataloader.DataLoader]) → Dict[str, torch.utils.data.dataloader.DataLoader][source]¶ Check pytorch dataloaders for distributed setup. Transfers them to distirbuted mode if necessary. (Experimental feature)
- Parameters
loaders (Dict[str, DataLoader]) – dictionery with pytorch dataloaders
- Returns
- dictionery
with pytorch dataloaders (with distributed samplers if necessary)
- Return type
Dict[str, DataLoader]
-
catalyst.utils.loaders.
get_loaders_from_params
(batch_size: int = 1, num_workers: int = 0, drop_last: bool = False, per_gpu_scaling: bool = False, loaders_params: Dict[str, Any] = None, samplers_params: Dict[str, Any] = None, initial_seed: int = 42, get_datasets_fn: Callable = None, **data_params) → OrderedDict[str, DataLoader][source]¶ Creates pytorch dataloaders from datasets and additional parameters.
- Parameters
batch_size –
batch_size
parameter fromtorch.utils.data.DataLoader
num_workers –
num_workers
parameter fromtorch.utils.data.DataLoader
drop_last –
drop_last
parameter fromtorch.utils.data.DataLoader
per_gpu_scaling – boolean flag, if
True
, usesbatch_size=batch_size*num_available_gpus
loaders_params (Dict[str, Any]) – additional loaders parameters
samplers_params (Dict[str, Any]) – additional sampler parameters
initial_seed – initial seed for
torch.utils.data.DataLoader
workersget_datasets_fn (Callable) – callable function to get dictionary with
torch.utils.data.Datasets
**data_params – additional data parameters or dictionary with
torch.utils.data.Datasets
to use for pytorch dataloaders creation
- Returns
- dictionary with
torch.utils.data.DataLoader
- Return type
OrderedDict[str, DataLoader]
- Raises
NotImplementedError – if datasource is out of Dataset or dict
ValueError – if batch_sampler option is mutually exclusive with distributed
-
catalyst.utils.loaders.
validate_loaders
(loaders: Dict[str, torch.utils.data.dataloader.DataLoader]) → Dict[str, torch.utils.data.dataloader.DataLoader][source] Check pytorch dataloaders for distributed setup. Transfers them to distirbuted mode if necessary. (Experimental feature)
- Parameters
loaders (Dict[str, DataLoader]) – dictionery with pytorch dataloaders
- Returns
- dictionery
with pytorch dataloaders (with distributed samplers if necessary)
- Return type
Dict[str, DataLoader]
Misc¶
-
catalyst.utils.misc.
copy_directory
(input_dir: pathlib.Path, output_dir: pathlib.Path) → None[source]¶ Recursively copies the input directory.
- Parameters
input_dir – input directory
output_dir – output directory
-
catalyst.utils.misc.
format_metric
(name: str, value: float) → str[source]¶ Format metric.
Metric will be returned in the scientific format if 4 decimal chars are not enough (metric value lower than 1e-4).
- Parameters
name – metric name
value – value of metric
- Returns
formatted metric
- Return type
str
-
catalyst.utils.misc.
get_fn_default_params
(fn: Callable[[...], Any], exclude: List[str] = None)[source]¶ Return default parameters of Callable.
- Parameters
fn (Callable[.., Any]) – target Callable
exclude – exclude list of parameters
- Returns
contains default parameters of fn
- Return type
dict
-
catalyst.utils.misc.
get_fn_argsnames
(fn: Callable[[...], Any], exclude: List[str] = None)[source]¶ Return parameter names of Callable.
- Parameters
fn (Callable[.., Any]) – target Callable
exclude – exclude list of parameters
- Returns
contains parameter names of fn
- Return type
list
-
catalyst.utils.misc.
get_utcnow_time
(format: str = None) → str[source]¶ Return string with current utc time in chosen format.
- Parameters
format – format string. if None “%y%m%d.%H%M%S” will be used.
- Returns
formatted utc time string
- Return type
str
-
catalyst.utils.misc.
is_exception
(ex: Any) → bool[source]¶ Check if the argument is of
Exception
type.
-
catalyst.utils.misc.
maybe_recursive_call
(object_or_dict, method: Union[str, Callable], recursive_args=None, recursive_kwargs=None, **kwargs)[source]¶ Calls the
method
recursively for theobject_or_dict
.- Parameters
object_or_dict – some object or a dictionary of objects
method – method name to call
recursive_args – list of arguments to pass to the
method
recursive_kwargs – list of key-arguments to pass to the
method
**kwargs – Arbitrary keyword arguments
- Returns
result of method call
-
catalyst.utils.misc.
fn_ends_with_pass
(fn: Callable[[...], Any])[source]¶ Check that function end with pass statement (probably does nothing in any way). Mainly used to filter callbacks with empty on_{event} methods.
- Parameters
fn (Callable[.., Any]) – target Callable
- Returns
True if there is pass in the first indentation level of fn and nothing happens before it, False in any other case.
- Return type
bool
Numpy¶
-
catalyst.utils.numpy.
get_one_hot
(label: int, num_classes: int, smoothing: float = None) → numpy.ndarray[source]¶ Applies OneHot vectorization to a giving scalar, optional with label smoothing as described in Bag of Tricks for Image Classification with Convolutional Neural Networks.
- Parameters
label – scalar value to be vectorized
num_classes – total number of classes
smoothing (float, optional) – if specified applies label smoothing from
Bag of Tricks for Image Classification with Convolutional Neural Networks
paper
- Returns
a one-hot vector with shape
(num_classes,)
- Return type
np.ndarray
Parser¶
Pipelines¶
-
catalyst.utils.pipelines.
clone_pipeline
(template: str, out_dir: pathlib.Path) → None[source]¶ Clones pipeline from empty pipeline template or from demo pipelines available in Git repos of Catalyst Team.
- Parameters
template – type of pipeline you want to clone. empty/classification/segmentation
out_dir – path where pipeline directory should be cloned
Pruning¶
-
catalyst.utils.pruning.
prune_model
(model: torch.nn.modules.module.Module, pruning_fn: Callable, keys_to_prune: List[str], amount: Union[float, int], layers_to_prune: Optional[List[str]] = None, reinitialize_after_pruning: Optional[bool] = False) → None[source]¶ Prune model function can be used for pruning certain tensors in model layers.
- Raises
AttributeError – If layers_to_prune is not None, but there is no layers with specified name.
Exception – If no layers have specified keys.
- Parameters
model – Model to be pruned.
pruning_fn – Pruning function with API same as in torch.nn.utils.pruning. pruning_fn(module, name, amount).
keys_to_prune – list of strings. Determines which tensor in modules will be pruned.
amount – quantity of parameters to prune. If float, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If int, it represents the absolute number of parameters to prune.
layers_to_prune – list of strings - module names to be pruned. If None provided then will try to prune every module in model.
reinitialize_after_pruning – if True then will reinitialize model after pruning. (Lottery Ticket Hypothesis check e.g.)
-
catalyst.utils.pruning.
remove_reparametrization
(model: torch.nn.modules.module.Module, keys_to_prune: List[str], layers_to_prune: Optional[List[str]] = None) → None[source]¶ Removes pre-hooks and pruning masks from the model.
- Parameters
model – model to remove reparametrization.
keys_to_prune – list of strings. Determines which tensor in modules have already been pruned.
layers_to_prune – list of strings - module names have already been pruned. If None provided then will try to prune every module in model.
Quantization¶
-
catalyst.utils.quantization.
quantize_model_from_checkpoint
(logdir: pathlib.Path, checkpoint_name: str, stage: str = None, qconfig_spec: Union[Set, Dict, None] = None, dtype: Optional[torch.dtype] = torch.qint8, backend: str = None) → torch.nn.modules.module.Module[source]¶ Quantize model using created experiment and runner.
- Parameters
logdir (Union[str, Path]) – Path to Catalyst logdir with model
checkpoint_name – Name of model checkpoint to use
stage – experiment’s stage name
qconfig_spec – torch.quantization.quantize_dynamic parameter, you can define layers to be quantize
dtype – type of the model parameters, default int8
backend – defines backend for quantization
- Returns
Quantized model
-
catalyst.utils.quantization.
save_quantized_model
(model: torch.nn.modules.module.Module, logdir: Union[str, pathlib.Path] = None, checkpoint_name: str = None, out_dir: Union[str, pathlib.Path] = None, out_model: Union[str, pathlib.Path] = None) → None[source]¶ Saves quantized model.
- Parameters
model – Traced model
logdir (Union[str, Path]) – Path to experiment
checkpoint_name – name for the checkpoint
out_dir (Union[str, Path]) – Directory to save model to (overrides logdir)
out_model (Union[str, Path]) – Path to save model to (overrides logdir & out_dir)
- Raises
ValueError – if nothing out of logdir, out_dir or out_model is specified.
Scripts¶
-
catalyst.utils.scripts.
import_module
(expdir: Union[str, pathlib.Path])[source]¶ Imports python module by path.
- Parameters
expdir – path to python module.
- Returns
Imported module.
-
catalyst.utils.scripts.
dump_code
(expdir: Union[str, pathlib.Path], logdir: Union[str, pathlib.Path]) → None[source]¶ Dumps Catalyst code for reproducibility.
- Parameters
expdir (Union[str, pathlib.Path]) – experiment dir path
logdir (Union[str, pathlib.Path]) – logging dir path
-
catalyst.utils.scripts.
dump_python_files
(src: pathlib.Path, dst: pathlib.Path) → None[source]¶ Dumps python code (
*.py
and*.ipynb
) files.- Parameters
src – source code path
dst – destination code path
-
catalyst.utils.scripts.
prepare_config_api_components
(expdir: pathlib.Path, config: Dict)[source]¶ Imports and create core Config API components - Experiment, Runner and Config from
expdir
- experiment directory andconfig
- experiment config.- Parameters
expdir – experiment directory path
config – dictionary with experiment Config
- Returns
Experiment, Runner, Config for Config API usage.
-
catalyst.utils.scripts.
dump_experiment_code
(src: pathlib.Path, dst: pathlib.Path) → None[source]¶ Dumps your experiment code for Config API use cases.
- Parameters
src – source code path
dst – destination code path
-
catalyst.utils.scripts.
distributed_cmd_run
(worker_fn: Callable, distributed: bool = True, *args, **kwargs) → None[source]¶ Distributed run
- Parameters
worker_fn – worker fn to run in distributed mode
distributed – distributed flag
args – additional parameters for worker_fn
kwargs – additional key-value parameters for worker_fn
Sys¶
-
catalyst.utils.sys.
get_environment_vars
() → Dict[str, Any][source]¶ Creates a dictionary with environment variables.
- Returns
environment variables
- Return type
Dict
-
catalyst.utils.sys.
list_conda_packages
() → str[source]¶ Lists conda installed packages.
- Returns
list with conda installed packages
- Return type
str
-
catalyst.utils.sys.
list_pip_packages
() → str[source]¶ Lists pip installed packages.
- Returns
string with pip installed packages
- Return type
str
-
catalyst.utils.sys.
dump_environment
(experiment_config: Dict, logdir: str, configs_path: List[str] = None) → None[source]¶ Saves config, environment variables and package list in JSON into logdir.
- Parameters
experiment_config – experiment config
logdir – path to logdir
configs_path – path(s) to config
Torch¶
-
catalyst.utils.torch.
get_optimizable_params
(model_or_params)[source]¶ Returns all the parameters that requires gradients.
-
catalyst.utils.torch.
get_optimizer_momentum
(optimizer: torch.optim.optimizer.Optimizer) → float[source]¶ Get momentum of current optimizer.
- Parameters
optimizer – PyTorch optimizer
- Returns
momentum at first param group
- Return type
float
-
catalyst.utils.torch.
set_optimizer_momentum
(optimizer: torch.optim.optimizer.Optimizer, value: float, index: int = 0)[source]¶ Set momentum of
index
‘th param group of optimizer tovalue
.- Parameters
optimizer – PyTorch optimizer
value – new value of momentum
index (int, optional) – integer index of optimizer’s param groups, default is 0
-
catalyst.utils.torch.
get_device
() → torch.device[source]¶ Simple returning the best available device (TPU > GPU > CPU).
-
catalyst.utils.torch.
get_available_gpus
()[source]¶ Array of available GPU ids.
Examples
>>> os.environ["CUDA_VISIBLE_DEVICES"] = "0,2" >>> get_available_gpus() [0, 2]
>>> os.environ["CUDA_VISIBLE_DEVICES"] = "0,-1,1" >>> get_available_gpus() [0]
>>> os.environ["CUDA_VISIBLE_DEVICES"] = "" >>> get_available_gpus() []
>>> os.environ["CUDA_VISIBLE_DEVICES"] = "-1" >>> get_available_gpus() []
- Returns
available GPU ids
- Return type
iterable
-
catalyst.utils.torch.
get_activation_fn
(activation: str = None)[source]¶ Returns the activation function from
torch.nn
by its name.
-
catalyst.utils.torch.
any2device
(value, device: Union[str, torch.device])[source]¶ Move tensor, list of tensors, list of list of tensors, dict of tensors, tuple of tensors to target device.
- Parameters
value – Object to be moved
device – target device ids
- Returns
Same structure as value, but all tensors and np.arrays moved to device
-
catalyst.utils.torch.
prepare_cudnn
(deterministic: bool = None, benchmark: bool = None) → None[source]¶ Prepares CuDNN benchmark and sets CuDNN to be deterministic/non-deterministic mode
- Parameters
deterministic – deterministic mode if running in CuDNN backend.
benchmark – If
True
use CuDNN heuristics to figure out which algorithm will be most performant for your model architecture and input. Setting it toFalse
may slow down your training.
-
catalyst.utils.torch.
process_model_params
(model: torch.nn.modules.module.Module, layerwise_params: Dict[str, dict] = None, no_bias_weight_decay: bool = True, lr_scaling: float = 1.0) → List[Union[torch.nn.parameter.Parameter, dict]][source]¶ Gains model parameters for
torch.optim.Optimizer
.- Parameters
model – Model to process
layerwise_params – Order-sensitive dict where each key is regex pattern and values are layer-wise options for layers matching with a pattern
no_bias_weight_decay – If true, removes weight_decay for all
bias
parameters in the modellr_scaling – layer-wise learning rate scaling, if 1.0, learning rates will not be scaled
- Returns
parameters for an optimizer
- Return type
iterable
Example:
>>> model = catalyst.contrib.models.segmentation.ResnetUnet() >>> layerwise_params = collections.OrderedDict([ >>> ("conv1.*", dict(lr=0.001, weight_decay=0.0003)), >>> ("conv.*", dict(lr=0.002)) >>> ]) >>> params = process_model_params(model, layerwise_params) >>> optimizer = torch.optim.Adam(params, lr=0.0003)
-
catalyst.utils.torch.
get_requires_grad
(model: torch.nn.modules.module.Module)[source]¶ Gets the
requires_grad
value for all model parameters.Example:
>>> model = SimpleModel() >>> requires_grad = get_requires_grad(model)
- Parameters
model – model
- Returns
value
- Return type
requires_grad (Dict[str, bool])
-
catalyst.utils.torch.
set_requires_grad
(model: torch.nn.modules.module.Module, requires_grad: Union[bool, Dict[str, bool]])[source]¶ Sets the
requires_grad
value for all model parameters.Example:
>>> model = SimpleModel() >>> set_requires_grad(model, requires_grad=True) >>> # or >>> model = SimpleModel() >>> set_requires_grad(model, requires_grad={""})
- Parameters
model – model
requires_grad (Union[bool, Dict[str, bool]]) – value
-
catalyst.utils.torch.
get_network_output
(net: torch.nn.modules.module.Module, *input_shapes_args, **input_shapes_kwargs)[source]¶ # noqa: D202 For each input shape returns an output tensor
Examples
>>> net = nn.Linear(10, 5) >>> utils.get_network_output(net, (1, 10)) tensor([[[-0.2665, 0.5792, 0.9757, -0.5782, 0.1530]]])
- Parameters
net – the model
*input_shapes_args – variable length argument list of shapes
**input_shapes_kwargs – key-value arguemnts of shapes
- Returns
tensor with network output
-
catalyst.utils.torch.
detach
(tensor: torch.Tensor) → numpy.ndarray[source]¶ Detach a pytorch tensor from graph and convert it to numpy array
- Parameters
tensor – PyTorch tensor
- Returns
numpy ndarray
-
catalyst.utils.torch.
trim_tensors
(tensors)[source]¶ Trim padding off of a batch of tensors to the smallest possible length. Should be used with catalyst.data.DynamicLenBatchSampler.
Adapted from Dynamic minibatch trimming to improve BERT training speed.
- Parameters
tensors – list of tensors to trim.
- Returns
list of trimmed tensors.
- Return type
List[torch.tensor]
Tracing¶
-
catalyst.utils.tracing.
trace_model
(model: torch.nn.modules.module.Module, predict_fn: Callable, batch=None, method_name: str = 'forward', mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, device: Union[str, torch.device] = 'cpu', predict_params: dict = None) → torch.jit.ScriptModule[source]¶ Traces model using runner and batch.
- Parameters
model – Model to trace
predict_fn – Function to run prediction with the model provided, takes model, inputs parameters
batch – Batch to trace the model
method_name – Model’s method name that will be used as entrypoint during tracing
mode – Mode for model to trace (
train
oreval
)requires_grad – Flag to use grads
opt_level – Apex FP16 init level, optional
device – Torch device
predict_params – additional parameters for model forward
- Returns
Traced model
- Return type
jit.ScriptModule
- Raises
ValueError – if both batch and predict_fn must be specified or mode is not in ‘eval’ or ‘train’.
-
catalyst.utils.tracing.
trace_model_from_checkpoint
(logdir: pathlib.Path, method_name: str, checkpoint_name: str, stage: str = None, loader: Union[str, int] = None, mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, device: Union[str, torch.device] = 'cpu')[source]¶ Traces model using created experiment and runner.
- Parameters
logdir (Union[str, Path]) – Path to Catalyst logdir with model
checkpoint_name – Name of model checkpoint to use
stage – experiment’s stage name
loader (Union[str, int]) – experiment’s loader name or its index
method_name – Model’s method name that will be used as entrypoint during tracing
mode – Mode for model to trace (
train
oreval
)requires_grad – Flag to use grads
opt_level – AMP FP16 init level
device – Torch device
- Returns
the traced model
-
catalyst.utils.tracing.
trace_model_from_runner
(runner: IRunner, checkpoint_name: str = None, method_name: str = 'forward', mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, device: Union[str, torch.device] = 'cpu') → torch.jit.ScriptModule[source]¶ Traces model using created experiment and runner.
- Parameters
runner – current runner.
checkpoint_name – Name of model checkpoint to use, if None traces current model from runner
method_name – Model’s method name that will be used as entrypoint during tracing
mode – Mode for model to trace (
train
oreval
)requires_grad – Flag to use grads
opt_level – AMP FP16 init level
device – Torch device
- Returns
Traced model
- Return type
ScriptModule
-
catalyst.utils.tracing.
get_trace_name
(method_name: str, mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, additional_string: str = None) → str[source]¶ Creates a file name for the traced model.
- Parameters
method_name – model’s method name
mode –
train
oreval
requires_grad – flag if model was traced with gradients
opt_level – opt_level if model was traced in FP16
additional_string – any additional information
- Returns
Filename for traced model to be saved.
- Return type
str
-
catalyst.utils.tracing.
save_traced_model
(model: torch.jit.ScriptModule, logdir: Union[str, pathlib.Path] = None, method_name: str = 'forward', mode: str = 'eval', requires_grad: bool = False, opt_level: str = None, out_dir: Union[str, pathlib.Path] = None, out_model: Union[str, pathlib.Path] = None, checkpoint_name: str = None) → None[source]¶ Saves traced model.
- Parameters
model – Traced model
logdir (Union[str, Path]) – Path to experiment
method_name – Name of the method was traced
mode – Model’s mode - train or eval
requires_grad – Whether model was traced with require_grad or not
opt_level – Apex FP16 init level used during tracing
out_dir (Union[str, Path]) – Directory to save model to (overrides logdir)
out_model (Union[str, Path]) – Path to save model to (overrides logdir & out_dir)
checkpoint_name – Checkpoint name used to restore the model
- Raises
ValueError – if nothing out of logdir, out_dir or out_model is specified.
-
catalyst.utils.tracing.
load_traced_model
(model_path: Union[str, pathlib.Path], device: Union[str, torch.device] = 'cpu', opt_level: str = None) → torch.jit.ScriptModule[source]¶ Loads a traced model.
- Parameters
model_path – Path to traced model
device – Torch device
opt_level – Apex FP16 init level, optional
- Returns
Traced model
- Return type
ScriptModule
Wizard¶
-
class
catalyst.utils.wizard.
Wizard
[source]¶ Bases:
object
Class for Catalyst Config API Wizard.
The instance of this class will be created and called from cli command:
catalyst-dl init --interactive
.With help of this Wizard user will be able to setup pipeline from available templates and make choices of what predefined classes to use in different parts of pipeline.
Contrib¶
Argparse¶
-
catalyst.contrib.utils.argparse.
boolean_flag
(parser: argparse.ArgumentParser, name: str, default: Optional[bool] = False, help: str = None, shorthand: str = None) → None[source]¶ Add a boolean flag to a parser inplace.
Examples
>>> parser = argparse.ArgumentParser() >>> boolean_flag( >>> parser, "flag", default=False, help="some flag", shorthand="f" >>> )
- Parameters
parser – parser to add the flag to
name – argument name –<name> will enable the flag, while –no-<name> will disable it
default (bool, optional) – default value of the flag
help – help string for the flag
shorthand – shorthand string for the argument
Compression¶
-
catalyst.contrib.utils.compression.
pack
(data)¶ Serialize the data into bytes using pickle.
- Parameters
data – a value
- Returns
Returns a bytes object serialized with pickle data.
-
catalyst.contrib.utils.compression.
pack_if_needed
(data)¶ Serialize the data into bytes using pickle.
- Parameters
data – a value
- Returns
Returns a bytes object serialized with pickle data.
-
catalyst.contrib.utils.compression.
unpack
(bytes)¶ Deserialize bytes into an object using pickle.
- Parameters
bytes – a bytes object containing serialized with pickle data.
- Returns
Returns a value deserialized from the bytes-like object.
-
catalyst.contrib.utils.compression.
unpack_if_needed
(bytes)¶ Deserialize bytes into an object using pickle.
- Parameters
bytes – a bytes object containing serialized with pickle data.
- Returns
Returns a value deserialized from the bytes-like object.
Confusion Matrix¶
-
catalyst.contrib.utils.confusion_matrix.
calculate_tp_fp_fn
(confusion_matrix: numpy.ndarray) → numpy.ndarray[source]¶ @TODO: Docs. Contribution is welcome.
-
catalyst.contrib.utils.confusion_matrix.
calculate_confusion_matrix_from_arrays
(predictions: numpy.ndarray, labels: numpy.ndarray, num_classes: int) → numpy.ndarray[source]¶ Calculate confusion matrix for a given set of classes. If labels value is outside of the [0, num_classes) it is excluded.
- Parameters
predictions – model predictions
labels – ground truth labels
num_classes – number of classes
- Returns
confusion matrix
- Return type
np.ndarray
-
catalyst.contrib.utils.confusion_matrix.
calculate_confusion_matrix_from_tensors
(y_pred_logits: torch.Tensor, y_true: torch.Tensor) → numpy.ndarray[source]¶ Calculate confusion matrix from tensors.
- Parameters
y_pred_logits – model logits
y_true – true labels
- Returns
confusion matrix
- Return type
np.ndarray
Dataset¶
-
catalyst.contrib.utils.dataset.
create_dataset
(dirs: str, extension: str = None, process_fn: Callable[[str], object] = None, recursive: bool = False) → Dict[str, object][source]¶ Create dataset (dict like {key: [values]}) from vctk-like dataset:
dataset/ cat/ *.ext dog/ *.ext
- Parameters
dirs – path to dirs, for example /home/user/data/**
extension – data extension you are looking for
process_fn (Callable[[str], object]) – function(path_to_file) -> object process function for found files, by default
recursive – enables recursive globbing
- Returns
dataset
- Return type
dict
-
catalyst.contrib.utils.dataset.
create_dataframe
(dataset: Dict[str, object], **dataframe_args) → pandas.core.frame.DataFrame[source]¶ Create pd.DataFrame from dict like {key: [values]}.
- Parameters
dataset – dict like {key: [values]}
**dataframe_args –
- indexIndex or array-like
Index to use for resulting frame. Will default to np.arange(n) if no indexing information part of input data and no index provided
- columnsIndex or array-like
Column labels to use for resulting frame. Will default to np.arange(n) if no column labels are provided
- dtypedtype, default None
Data type to force, otherwise infer
- Returns
dataframe from giving dataset
- Return type
pd.DataFrame
-
catalyst.contrib.utils.dataset.
split_dataset_train_test
(dataset: pandas.core.frame.DataFrame, **train_test_split_args) → Tuple[Dict[str, object], Dict[str, object]][source]¶ Split dataset in train and test parts.
- Parameters
dataset – dict like dataset
**train_test_split_args –
- test_sizefloat, int, or None (default is None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is automatically set to the complement of the train size. If train size is also None, test size is set to 0.25.
- train_sizefloat, int, or None (default is None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.
- random_stateint or RandomState
Pseudo-random number generator state used for random sampling.
- stratifyarray-like or None (default is None)
If not None, data is split in a stratified fashion, using this as the class labels.
- Returns
train and test dicts
Misc¶
-
catalyst.contrib.utils.misc.
args_are_not_none
(*args: Optional[Any]) → bool[source]¶ Check that all arguments are not
None
.- Parameters
*args – values # noqa: RST213
- Returns
True if all value were not None, False otherwise
- Return type
bool
-
catalyst.contrib.utils.misc.
make_tuple
(tuple_like)[source]¶ Creates a tuple if given
tuple_like
value isn’t list or tuple.- Parameters
tuple_like – tuple like object - list or tuple
- Returns
tuple or list
Pandas¶
-
catalyst.contrib.utils.pandas.
dataframe_to_list
(dataframe: pandas.core.frame.DataFrame) → List[dict][source]¶ Converts dataframe to a list of rows (without indexes).
- Parameters
dataframe – input dataframe
- Returns
list of rows
- Return type
List[dict]
-
catalyst.contrib.utils.pandas.
folds_to_list
(folds: Union[list, str, pandas.core.series.Series]) → List[int][source]¶ This function formats string or either list of numbers into a list of unique int.
Examples
>>> folds_to_list("1,2,1,3,4,2,4,6") [1, 2, 3, 4, 6] >>> folds_to_list([1, 2, 3.0, 5]) [1, 2, 3, 5]
- Parameters
folds (Union[list, str, pd.Series]) – Either list of numbers or one string with numbers separated by commas or pandas series
- Returns
list of unique ints
- Return type
List[int]
- Raises
ValueError – if value in string or array cannot be casted to int
-
catalyst.contrib.utils.pandas.
split_dataframe
(dataframe: pandas.core.frame.DataFrame, train_folds: List[int], valid_folds: Optional[List[int]] = None, infer_folds: Optional[List[int]] = None, tag2class: Optional[Dict[str, int]] = None, tag_column: str = None, class_column: str = None, seed: int = 42, n_folds: int = 5) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]¶ Split a Pandas DataFrame into folds.
- Parameters
dataframe – input dataframe
train_folds – train folds
valid_folds (List[int], optional) – valid folds. If none takes all folds not included in
train_folds
infer_folds (List[int], optional) – infer folds. If none takes all folds not included in
train_folds
andvalid_folds
tag2class (Dict[str, int], optional) – mapping from label names into int
tag_column (str, optional) – column with label names
class_column (str, optional) – column to use for split
seed – seed for split
n_folds – number of folds
- Returns
- tuple with 4 dataframes
whole dataframe, train part, valid part and infer part
- Return type
tuple
-
catalyst.contrib.utils.pandas.
split_dataframe_on_column_folds
(dataframe: pandas.core.frame.DataFrame, column: str, random_state: int = 42, n_folds: int = 5) → pandas.core.frame.DataFrame[source]¶ Splits DataFrame into N folds.
- Parameters
dataframe – a dataset
column – which column to use
random_state – seed for random shuffle
n_folds – number of result folds
- Returns
new dataframe with fold column
- Return type
pd.DataFrame
-
catalyst.contrib.utils.pandas.
split_dataframe_on_folds
(dataframe: pandas.core.frame.DataFrame, random_state: int = 42, n_folds: int = 5) → pandas.core.frame.DataFrame[source]¶ Splits DataFrame into N folds.
- Parameters
dataframe – a dataset
random_state – seed for random shuffle
n_folds – number of result folds
- Returns
new dataframe with fold column
- Return type
pd.DataFrame
-
catalyst.contrib.utils.pandas.
split_dataframe_on_stratified_folds
(dataframe: pandas.core.frame.DataFrame, class_column: str, random_state: int = 42, n_folds: int = 5) → pandas.core.frame.DataFrame[source]¶ Splits DataFrame into N stratified folds.
Also see
catalyst.data.sampler.BalanceClassSampler
- Parameters
dataframe – a dataset
class_column – which column to use for split
random_state – seed for random shuffle
n_folds – number of result folds
- Returns
new dataframe with fold column
- Return type
pd.DataFrame
-
catalyst.contrib.utils.pandas.
split_dataframe_train_test
(dataframe: pandas.core.frame.DataFrame, **train_test_split_args) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]¶ Split dataframe in train and test part.
- Parameters
dataframe – pd.DataFrame to split
**train_test_split_args –
- test_sizefloat, int, or None (default is None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is automatically set to the complement of the train size. If train size is also None, test size is set to 0.25.
- train_sizefloat, int, or None (default is None)
If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size.
- random_stateint or RandomState
Pseudo-random number generator state used for random sampling.
- stratifyarray-like or None (default is None)
If not None, data is split in a stratified fashion, using this as the class labels.
- Returns
train and test DataFrames
Note
It exist cause sklearn split is overcomplicated.
Separates values in
class_column
column.- Parameters
dataframe – a dataset
tag_column – column name to separate values
tag_delim – delimiter to separate values
- Returns
new dataframe
- Return type
pd.DataFrame
-
catalyst.contrib.utils.pandas.
read_multiple_dataframes
(in_csv_train: str = None, in_csv_valid: str = None, in_csv_infer: str = None, tag2class: Optional[Dict[str, int]] = None, class_column: str = None, tag_column: str = None) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]¶ This function reads train/valid/infer dataframes from giving paths.
- Parameters
in_csv_train – paths to train csv separated by commas
in_csv_valid – paths to valid csv separated by commas
in_csv_infer – paths to infer csv separated by commas
tag2class (Dict[str, int], optional) – mapping from label names into int
tag_column (str, optional) – column with label names
class_column (str, optional) – column to use for split
- Returns
- tuple with 4 dataframes
whole dataframe, train part, valid part and infer part
- Return type
tuple
-
catalyst.contrib.utils.pandas.
map_dataframe
(dataframe: pandas.core.frame.DataFrame, tag_column: str, class_column: str, tag2class: Dict[str, int], verbose: bool = False) → pandas.core.frame.DataFrame[source]¶ This function maps tags from
tag_column
to ints intoclass_column
usingtag2class
dictionary.- Parameters
dataframe – input dataframe
tag_column – column with tags
class_column (str) –
tag2class (Dict[str, int]) – mapping from tags to class labels
verbose – flag if true, uses tqdm
- Returns
updated dataframe with
class_column
- Return type
pd.DataFrame
-
catalyst.contrib.utils.pandas.
get_dataset_labeling
(dataframe: pandas.core.frame.DataFrame, tag_column: str) → Dict[str, int][source]¶ Prepares a mapping using unique values from
tag_column
.{ "class_name_0": 0, "class_name_1": 1, ... "class_name_N": N }
- Parameters
dataframe – a dataset
tag_column – which column to use
- Returns
mapping from tag to labels
- Return type
Dict[str, int]
-
catalyst.contrib.utils.pandas.
merge_multiple_fold_csv
(fold_name: str, paths: Optional[str]) → pandas.core.frame.DataFrame[source]¶ Reads csv into one DataFrame with column
fold
.- Parameters
fold_name – current fold name
paths – paths to csv separated by commas
- Returns
merged dataframes with column
fold
==fold_name
- Return type
pd.DataFrame
-
catalyst.contrib.utils.pandas.
read_csv_data
(in_csv: str = None, train_folds: Optional[List[int]] = None, valid_folds: Optional[List[int]] = None, infer_folds: Optional[List[int]] = None, seed: int = 42, n_folds: int = 5, in_csv_train: str = None, in_csv_valid: str = None, in_csv_infer: str = None, tag2class: Optional[Dict[str, int]] = None, class_column: str = None, tag_column: str = None) → Tuple[pandas.core.frame.DataFrame, List[dict], List[dict], List[dict]][source]¶ From giving path
in_csv
reads a dataframe and split it to train/valid/infer folds or from several pathsin_csv_train
,in_csv_valid
,in_csv_infer
reads independent folds.Note
- This function can be used with different combinations of params.
- First block is used to get dataset from one csv:
in_csv, train_folds, valid_folds, infer_folds, seed, n_folds
- Second includes paths to different csv for train/valid and infer parts:
in_csv_train, in_csv_valid, in_csv_infer
- The other params (tag2class, tag_column, class_column) are optional
for any previous block
- Parameters
in_csv – paths to whole dataset
train_folds – train folds
valid_folds (List[int], optional) – valid folds. If none takes all folds not included in
train_folds
infer_folds (List[int], optional) – infer folds. If none takes all folds not included in
train_folds
andvalid_folds
seed – seed for split
n_folds – number of folds
in_csv_train – paths to train csv separated by commas
in_csv_valid – paths to valid csv separated by commas
in_csv_infer – paths to infer csv separated by commas
tag2class (Dict[str, int]) – mapping from label names into ints
tag_column – column with label names
class_column – column to use for split
- Returns
tuple with 4 elements (whole dataframe, list with train data, list with valid data and list with infer data)
- Return type
Tuple[pd.DataFrame, List[dict], List[dict], List[dict]]
-
catalyst.contrib.utils.pandas.
balance_classes
(dataframe: pandas.core.frame.DataFrame, class_column: str = 'label', random_state: int = 42, how: str = 'downsampling') → pandas.core.frame.DataFrame[source]¶ Balance classes in dataframe by
class_column
.See also
catalyst.data.sampler.BalanceClassSampler
.- Parameters
dataframe – a dataset
class_column – which column to use for split
random_state – seed for random shuffle
how – strategy to sample, must be one on [“downsampling”, “upsampling”]
- Returns
new dataframe with balanced
class_column
- Return type
pd.DataFrame
- Raises
NotImplementedError – if how is not in [“upsampling”, “downsampling”, int]
Parallel¶
-
catalyst.contrib.utils.parallel.
parallel_imap
(func, args, pool: Union[multiprocessing.pool.Pool, catalyst.contrib.utils.parallel.DumbPool]) → List[T][source]¶ @TODO: Docs. Contribution is welcome.
Plotly¶
-
catalyst.contrib.utils.plotly.
plot_tensorboard_log
(logdir: Union[str, pathlib.Path], step: Optional[str] = 'batch', metrics: Optional[List[str]] = None, height: Optional[int] = None, width: Optional[int] = None) → None[source]¶ @TODO: Docs. Contribution is welcome.
Adapted from https://github.com/belskikh/kekas/blob/v0.1.23/kekas/utils.py#L193
-
catalyst.contrib.utils.plotly.
plot_metrics
(logdir: Union[str, pathlib.Path], step: Optional[str] = 'epoch', metrics: Optional[List[str]] = None, height: Optional[int] = None, width: Optional[int] = None) → None[source]¶ Plots your learning results.
- Parameters
logdir – the logdir that was specified during training.
step – ‘batch’ or ‘epoch’ - what logs to show: for batches or for epochs
metrics – list of metrics to plot. The loss should be specified as ‘loss’, learning rate = ‘_base/lr’ and other metrics should be specified as names in metrics dict that was specified during training
height – the height of the whole resulting plot
width – the width of the whole resulting plot
Serialization¶
-
catalyst.contrib.utils.serialization.
serialize
(data)¶ Serialize the data into bytes using pickle.
- Parameters
data – a value
- Returns
Returns a bytes object serialized with pickle data.
-
catalyst.contrib.utils.serialization.
deserialize
(bytes)¶ Deserialize bytes into an object using pickle.
- Parameters
bytes – a bytes object containing serialized with pickle data.
- Returns
Returns a value deserialized from the bytes-like object.
Visualization¶
-
catalyst.contrib.utils.visualization.
plot_confusion_matrix
(cm, class_names=None, normalize=False, title='confusion matrix', fname=None, show=True, figsize=12, fontsize=32, colormap='Blues')[source]¶ Render the confusion matrix and return matplotlib”s figure with it. Normalization can be applied by setting normalize=True.
Computer Vision¶
Image¶
-
catalyst.contrib.utils.cv.image.
has_image_extension
(uri) → bool[source]¶ Check that file has image extension.
- Parameters
uri (Union[str, pathlib.Path]) – the resource to load the file from
- Returns
True if file has image extension, False otherwise
- Return type
bool
-
catalyst.contrib.utils.cv.image.
imread
(uri, grayscale: bool = False, expand_dims: bool = True, rootpath: Union[str, pathlib.Path] = None, **kwargs) → numpy.ndarray[source]¶ Reads an image from the specified file.
- Parameters
uri (str, pathlib.Path, bytes, file) – the resource to load the image from, e.g. a filename,
pathlib.Path
, http address or file object, seeimageio.imread
docs for more infograyscale – if True, make all images grayscale
expand_dims – if True, append channel axis to grayscale images rootpath (Union[str, pathlib.Path]): path to the resource with image (allows to use relative path)
rootpath (Union[str, pathlib.Path]) – path to the resource with image (allows to use relative path)
**kwargs – extra params for image read
- Returns
image
- Return type
np.ndarray
-
catalyst.contrib.utils.cv.image.
imwrite
(**kwargs)[source]¶ imwrite(uri, im, format=None, **kwargs)
Write an image to the specified file. Alias for
imageio.imwrite
.- Parameters
**kwargs – parameters for
imageio.imwrite
- Returns
image save result
-
catalyst.contrib.utils.cv.image.
imsave
(**kwargs)[source]¶ imwrite(uri, im, format=None, **kwargs)
Write an image to the specified file. Alias for
imageio.imsave
.- Parameters
**kwargs – parameters for
imageio.imsave
- Returns
image save result
-
catalyst.contrib.utils.cv.image.
mask_to_overlay_image
(image: numpy.ndarray, masks: List[numpy.ndarray], threshold: float = 0, mask_strength: float = 0.5) → numpy.ndarray[source]¶ Draws every mask for with some color over image.
- Parameters
image – RGB image used as underlay for masks
masks – list of masks
threshold – threshold for masks binarization
mask_strength – opacity of colorized masks
- Returns
HxWx3 image with overlay
- Return type
np.ndarray
-
catalyst.contrib.utils.cv.image.
mimread
(uri, clip_range: Tuple[int, int] = None, expand_dims: bool = True, rootpath: Union[str, pathlib.Path] = None, **kwargs) → numpy.ndarray[source]¶ Reads multiple images from the specified file.
- Parameters
uri (str, pathlib.Path, bytes, file) – the resource to load the image from, e.g. a filename,
pathlib.Path
, http address or file object, seeimageio.mimread
docs for more infoclip_range (Tuple[int, int]) – lower and upper interval edges, image values outside the interval are clipped to the interval edges
expand_dims – if True, append channel axis to grayscale images rootpath (Union[str, pathlib.Path]): path to the resource with image (allows to use relative path)
rootpath (Union[str, pathlib.Path]) – path to the resource with image (allows to use relative path)
**kwargs – extra params for image read
- Returns
image
- Return type
np.ndarray
Tensor¶
-
catalyst.contrib.utils.cv.tensor.
tensor_from_rgb_image
(image: numpy.ndarray) → torch.Tensor[source]¶ @TODO: Docs. Contribution is welcome.
-
catalyst.contrib.utils.cv.tensor.
tensor_to_ndimage
(images: torch.Tensor, denormalize: bool = True, mean: Tuple[float, float, float] = (0.485, 0.456, 0.406), std: Tuple[float, float, float] = (0.229, 0.224, 0.225), move_channels_dim: bool = True, dtype=<class 'numpy.float32'>) → numpy.ndarray[source]¶ Convert float image(s) with standard normalization to np.ndarray with [0..1] when dtype is np.float32 and [0..255] when dtype is np.uint8.
- Parameters
images – [B]xCxHxW float tensor
denormalize – if True, multiply image(s) by std and add mean
mean (Tuple[float, float, float]) – per channel mean to add
std (Tuple[float, float, float]) – per channel std to multiply
move_channels_dim – if True, convert tensor to [B]xHxWxC format
dtype – result ndarray dtype. Only float32 and uint8 are supported
- Returns
[B]xHxWxC np.ndarray of dtype
Natural language processing¶
Text¶
-
catalyst.contrib.utils.nlp.text.
tokenize_text
(text: str, tokenizer, max_length: int, strip: bool = True, lowercase: bool = True, remove_punctuation: bool = True) → Dict[str, numpy.array][source]¶ Tokenizes givin text.
- Parameters
text – text to tokenize
tokenizer – Tokenizer instance from HuggingFace
max_length – maximum length of tokens
strip – if true strips text before tokenizing
lowercase – if true makes text lowercase before tokenizing
remove_punctuation – if true removes
string.punctuation
from text before tokenizing
- Returns
batch with tokenized text
-
catalyst.contrib.utils.nlp.text.
process_bert_output
(bert_output, hidden_size: int, output_hidden_states: bool = False, pooling_groups: List[str] = None, mask: torch.Tensor = None, level: Union[int, str] = None)[source]¶ Processed BERT output.
- Parameters
bert_output – BERT output
hidden_size – hidden size of BERT layers
output_hidden_states – boolean flag if we need BERT hidden states
pooling_groups – list with pooling to use for sequence embedding
mask – boolean flag if we need mask
[PAD]
tokenslevel – integer with specified level to use
- Returns
processed output