Experiment checkpoints

With the help of CheckpointCallback Catalyst creates the following checkpoints structure under selected logdir:

    code/ <-- code of your experiment and dump of the catalyst, for reproducibility -->
    checkpoints/ <-- theme of the topic -->
        {model/runner}.{epoch_index:04d}.pth <-- topK checkpoints based on model selection logic -->
        best.pth <-- best model based on specified model selection logic -->
        last.pth <-- last model checkpoint in the whole experiment run -->


Catalyst saves 2 types of checkpoints:

  • model.{suffix}.pth - stores only model state dict and could be easily used for deploying in the production.

  • runner.{suffix}.pth - stores all state dicts for model(s), criterion(s), optimizer(s) and scheduler(s) and could be used for experiment analysis purposes.

Runner checkpoints are pure PyTorch checkpoints without any mixins with the following structure:

checkpoint.pth = {
    "model_state_dict": model.state_dict(),
    "criterion_state_dict": criterion.state_dict(),
    "optimizer_state_dict": optimizer.state_dict(),
    "scheduler_state_dict": scheduler.state_dict(),

Save runner

Catalyst has a user-friendly utils to save the model:

from catalyst import utils

model = Net()
checkpoint = utils.pack_checkpoint(model=model)
utils.save_checkpoint(checkpoint, logdir="/path/to/logdir", suffix="my_checkpoint")
#  now you could find your checkpoint under "/path/to/logdir/my_checkpoint.pth" location

Load runner

With Catalyst utils it’s very easy to load models after experiment run:

from catalyst import utils

model = Net()
optimizer = ...
criterion = ...
checkpoint = utils.load_checkpoint(path="/path/to/checkpoint")

In this case Catalyst would try to unpack requested state dicts from the checkpoint.

If you haven’t found the answer for your question, feel free to join our slack for the discussion.