RL¶
Core¶
-
class
catalyst.rl.core.agent.ActorSpec[source]¶ Bases:
abc.ABC,torch.nn.modules.module.Module-
abstract property
policy_type¶
-
abstract property
-
class
catalyst.rl.core.agent.CriticSpec[source]¶ Bases:
abc.ABC,torch.nn.modules.module.Module-
abstract property
distribution¶
-
abstract property
num_atoms¶
-
abstract property
num_outputs¶
-
abstract property
values_range¶
-
abstract property
-
class
catalyst.rl.core.algorithm.AlgorithmSpec[source]¶ Bases:
abc.ABC-
abstract property
gamma¶
-
abstract property
n_step¶
-
abstract classmethod
prepare_for_sampler(env_spec: catalyst.rl.core.environment.EnvironmentSpec, config: Dict) → Union[catalyst.rl.core.agent.ActorSpec, catalyst.rl.core.agent.CriticSpec][source]¶
-
abstract property
-
class
catalyst.rl.core.db.DBSpec[source]¶ Bases:
abc.ABC-
class
Message[source]¶ Bases:
enum.EnumAn enumeration.
-
DISABLE_SAMPLING= 3¶
-
DISABLE_TRAINING= 1¶
-
ENABLE_SAMPLING= 2¶
-
ENABLE_TRAINING= 0¶
-
-
abstract property
epoch¶
-
abstract property
num_trajectories¶
-
abstract property
sampling_enabled¶
-
abstract property
training_enabled¶
-
class
-
class
catalyst.rl.core.environment.EnvironmentSpec(visualize=False, mode='train', sampler_id=None)[source]¶ Bases:
abc.ABC-
abstract property
action_space¶
-
property
discrete_actions¶
-
property
history_len¶
-
abstract property
observation_space¶
-
property
reward_space¶
-
abstract property
state_space¶
-
abstract property
-
class
catalyst.rl.core.exploration.ExplorationStrategy(power=1.0)[source]¶ Bases:
objectBase class for working with various exploration strategies. In discrete case must contain method get_action(q_values). In continuous case must contain method get_action(action).
-
class
catalyst.rl.core.exploration.ExplorationHandler(*exploration_params, env: catalyst.rl.core.environment.EnvironmentSpec)[source]¶ Bases:
object
-
class
catalyst.rl.core.policy_handler.PolicyHandler(env: catalyst.rl.core.environment.EnvironmentSpec, agent: Union[catalyst.rl.core.agent.ActorSpec, catalyst.rl.core.agent.CriticSpec], device)[source]¶ Bases:
object
-
class
catalyst.rl.core.sampler.Sampler(agent: Union[catalyst.rl.core.agent.ActorSpec, catalyst.rl.core.agent.CriticSpec], env: catalyst.rl.core.environment.EnvironmentSpec, db_server: catalyst.rl.core.db.DBSpec = None, exploration_handler: catalyst.rl.core.exploration.ExplorationHandler = None, logdir: str = None, id: int = 0, mode: str = 'infer', deterministic: bool = None, weights_sync_period: int = 1, weights_sync_mode: str = None, sampler_seed: int = 42, trajectory_seeds: List = None, trajectory_limit: int = None, force_store: bool = False, gc_period: int = 10, monitoring_params: Dict = None, **kwargs)[source]¶ Bases:
object
-
class
catalyst.rl.core.sampler.ValidSampler(agent: Union[catalyst.rl.core.agent.ActorSpec, catalyst.rl.core.agent.CriticSpec], env: catalyst.rl.core.environment.EnvironmentSpec, db_server: catalyst.rl.core.db.DBSpec = None, exploration_handler: catalyst.rl.core.exploration.ExplorationHandler = None, logdir: str = None, id: int = 0, mode: str = 'infer', deterministic: bool = None, weights_sync_period: int = 1, weights_sync_mode: str = None, sampler_seed: int = 42, trajectory_seeds: List = None, trajectory_limit: int = None, force_store: bool = False, gc_period: int = 10, monitoring_params: Dict = None, **kwargs)[source]¶
-
class
catalyst.rl.core.trainer.TrainerSpec(algorithm: catalyst.rl.core.algorithm.AlgorithmSpec, env_spec: catalyst.rl.core.environment.EnvironmentSpec, db_server: catalyst.rl.core.db.DBSpec, logdir: str, num_workers: int = 1, batch_size: int = 64, min_num_transitions: int = 10000, online_update_period: int = 1, weights_sync_period: int = 1, save_period: int = 10, gc_period: int = 10, seed: int = 42, epoch_limit: int = None, monitoring_params: Dict = None, **kwargs)[source]¶ Bases:
object
-
class
catalyst.rl.core.trajectory_sampler.TrajectorySampler(env: catalyst.rl.core.environment.EnvironmentSpec, agent: Union[catalyst.rl.core.agent.ActorSpec, catalyst.rl.core.agent.CriticSpec], device, deterministic: bool = False, initial_capacity: int = 1000, sampling_flag: multiprocessing.context.BaseContext.Value = None)[source]¶ Bases:
object
Agent¶
-
class
catalyst.rl.agent.actor.ActorSpec[source]¶ Bases:
abc.ABC,torch.nn.modules.module.Module-
abstract property
policy_type¶
-
abstract property
-
class
catalyst.rl.agent.actor.Actor(state_net: catalyst.rl.agent.network.StateNet, head_net: catalyst.rl.agent.head.PolicyHead)[source]¶ Bases:
catalyst.rl.core.agent.ActorSpecActor which learns agents policy.
-
classmethod
get_from_params(state_net_params: Dict, policy_head_params: Dict, env_spec: catalyst.rl.core.environment.EnvironmentSpec)[source]¶
-
property
policy_type¶
-
classmethod
-
class
catalyst.rl.agent.critic.CriticSpec[source]¶ Bases:
abc.ABC,torch.nn.modules.module.Module-
abstract property
distribution¶
-
abstract property
num_atoms¶
-
abstract property
num_outputs¶
-
abstract property
values_range¶
-
abstract property
-
class
catalyst.rl.agent.critic.StateCritic(state_net: catalyst.rl.agent.network.StateNet, head_net: catalyst.rl.agent.head.ValueHead)[source]¶ Bases:
catalyst.rl.core.agent.CriticSpecCritic that learns state value functions, like V(s).
-
property
distribution¶
-
classmethod
get_from_params(state_net_params: Dict, value_head_params: Dict, env_spec: catalyst.rl.core.environment.EnvironmentSpec)[source]¶
-
property
hyperbolic_constant¶
-
property
num_atoms¶
-
property
num_heads¶
-
property
num_outputs¶
-
property
values_range¶
-
property
-
class
catalyst.rl.agent.critic.ActionCritic(state_net: catalyst.rl.agent.network.StateNet, head_net: catalyst.rl.agent.head.ValueHead)[source]¶ Bases:
catalyst.rl.agent.critic.StateCriticCritic that learns state-action value functions, like Q(s).
-
class
catalyst.rl.agent.critic.StateActionCritic(state_action_net: catalyst.rl.agent.network.StateActionNet, head_net: catalyst.rl.agent.head.ValueHead)[source]¶ Bases:
catalyst.rl.core.agent.CriticSpecCritic which learns state-action value functions, like Q(s, a).
-
property
distribution¶
-
classmethod
get_from_params(state_action_net_params: Dict, value_head_params: Dict, env_spec: catalyst.rl.core.environment.EnvironmentSpec)[source]¶
-
property
hyperbolic_constant¶
-
property
num_atoms¶
-
property
num_heads¶
-
property
num_outputs¶
-
property
values_range¶
-
property
-
class
catalyst.rl.agent.head.ValueHead(in_features: int, out_features: int, bias: bool = True, num_atoms: int = 1, use_state_value_head: bool = False, distribution: str = None, values_range: tuple = None, num_heads: int = 1, hyperbolic_constant: float = 1.0)[source]¶ Bases:
torch.nn.modules.module.Module
-
class
catalyst.rl.agent.head.PolicyHead(in_features: int, out_features: int, policy_type: str = None, out_activation: torch.nn.modules.module.Module = None)[source]¶ Bases:
torch.nn.modules.module.Module
-
class
catalyst.rl.agent.network.StateNet(main_net: torch.nn.modules.module.Module, observation_net: torch.nn.modules.module.Module = None, aggregation_net: torch.nn.modules.module.Module = None)[source]¶ Bases:
torch.nn.modules.module.Module-
__init__(main_net: torch.nn.modules.module.Module, observation_net: torch.nn.modules.module.Module = None, aggregation_net: torch.nn.modules.module.Module = None)[source]¶ Abstract network, that takes some tensor T of shape [bs; history_len; …] and outputs some representation tensor R of shape [bs; representation_size]
input_T [bs; history_len; in_features]
-> observation_net (aka observation_encoder) ->
observations_representations [bs; history_len; obs_features]
-> aggregation_net (flatten in simplified case) ->
aggregated_representation [bs; hid_features]
-> main_net ->
output_T [bs; representation_size]
- Parameters
main_net –
observation_net –
aggregation_net –
-
-
class
catalyst.rl.agent.network.StateActionNet(main_net: torch.nn.modules.module.Module, observation_net: torch.nn.modules.module.Module = None, action_net: torch.nn.modules.module.Module = None, aggregation_net: torch.nn.modules.module.Module = None)[source]¶ Bases:
torch.nn.modules.module.Module
-
class
catalyst.rl.agent.policy.SquashingGaussPolicy(squashing_fn=<class 'torch.nn.modules.activation.Tanh'>)[source]¶ Bases:
torch.nn.modules.module.Module
DB¶
-
class
catalyst.rl.db.mongo.MongoDB(host: str = '127.0.0.1', port: int = 12000, prefix: str = None, sync_epoch: bool = False, reconnect_timeout: int = 3)[source]¶ Bases:
catalyst.rl.core.db.DBSpec-
property
epoch¶
-
property
num_trajectories¶
-
property
sampling_enabled¶
-
property
training_enabled¶
-
property
Environments¶
Exploration¶
-
class
catalyst.rl.exploration.boltzman.Boltzmann(temp_init, temp_final, annealing_steps, temp_min=0.01)[source]¶ Bases:
catalyst.rl.core.exploration.ExplorationStrategyFor discrete environments only. Selects soft maximum action (softmax_a [Q(s,a)/t]). Temperature parameter t usually decreases during the course of training. Importantly, the effective range of t depends on the magnitutdes of environment rewards.
-
class
catalyst.rl.exploration.gauss.NoExploration(power=1.0)[source]¶ Bases:
catalyst.rl.core.exploration.ExplorationStrategyFor continuous environments only. Returns action produced by the actor network without changes.
-
class
catalyst.rl.exploration.gauss.GaussNoise(sigma)[source]¶ Bases:
catalyst.rl.core.exploration.ExplorationStrategyFor continuous environments only. Adds spherical Gaussian noise to the action produced by actor.
-
class
catalyst.rl.exploration.gauss.OrnsteinUhlenbeckProcess(sigma, theta, dt=0.01)[source]¶ Bases:
catalyst.rl.core.exploration.ExplorationStrategyFor continuous environments only. Adds temporally correlated Gaussian noise generated with Ornstein-Uhlenbeck process. Paper: https://arxiv.org/abs/1509.02971
-
class
catalyst.rl.exploration.greedy.Greedy(power=1.0)[source]¶ Bases:
catalyst.rl.core.exploration.ExplorationStrategyFor discrete environments only. Selects greedy action (argmax_a Q(s,a)).
-
class
catalyst.rl.exploration.greedy.EpsilonGreedy(eps_init, eps_final, annealing_steps, eps_min=0.01)[source]¶ Bases:
catalyst.rl.core.exploration.ExplorationStrategyFor discrete environments only. Selects random action with probability eps and greedy action (argmax_a Q(s,a)) with probability 1-eps. Random action selection probability eps usually decreases from 1 to 0.01-0.05 during the course of training.
-
class
catalyst.rl.exploration.param_noise.ParameterSpaceNoise(target_sigma, tolerance=0.001, max_steps=1000)[source]¶ Bases:
catalyst.rl.core.exploration.ExplorationStrategyFor continuous environments only. At the beginning of the episode, perturbs the weights of actor network forcing it to produce more diverse actions. Paper: https://arxiv.org/abs/1706.01905
Off-policy¶
-
class
catalyst.rl.offpolicy.trainer.OffpolicyTrainer(algorithm: catalyst.rl.core.algorithm.AlgorithmSpec, env_spec: catalyst.rl.core.environment.EnvironmentSpec, db_server: catalyst.rl.core.db.DBSpec, logdir: str, num_workers: int = 1, batch_size: int = 64, min_num_transitions: int = 10000, online_update_period: int = 1, weights_sync_period: int = 1, save_period: int = 10, gc_period: int = 10, seed: int = 42, epoch_limit: int = None, monitoring_params: Dict = None, **kwargs)[source]¶
Discrete¶
-
class
catalyst.rl.offpolicy.algorithms.critic.OffpolicyCritic(critic: catalyst.rl.core.agent.CriticSpec, gamma: float, n_step: int, critic_loss_params: Dict = None, critic_optimizer_params: Dict = None, critic_scheduler_params: Dict = None, critic_grad_clip_params: Dict = None, critic_tau: float = 1.0, **kwargs)[source]¶ Bases:
catalyst.rl.core.algorithm.AlgorithmSpec-
property
gamma¶
-
property
n_step¶
-
classmethod
prepare_for_sampler(env_spec: catalyst.rl.core.environment.EnvironmentSpec, config: Dict) → Union[catalyst.rl.core.agent.ActorSpec, catalyst.rl.core.agent.CriticSpec][source]¶
-
property
-
class
catalyst.rl.offpolicy.algorithms.dqn.DQN(critic: catalyst.rl.core.agent.CriticSpec, gamma: float, n_step: int, critic_loss_params: Dict = None, critic_optimizer_params: Dict = None, critic_scheduler_params: Dict = None, critic_grad_clip_params: Dict = None, critic_tau: float = 1.0, **kwargs)[source]¶ Bases:
catalyst.rl.offpolicy.algorithms.critic.OffpolicyCriticSwiss Army knife DQN algorithm.
Continuous¶
-
class
catalyst.rl.offpolicy.algorithms.actor_critic.OffpolicyActorCritic(actor: catalyst.rl.core.agent.ActorSpec, critic: catalyst.rl.core.agent.CriticSpec, gamma: float, n_step: int, actor_loss_params: Dict = None, critic_loss_params: Dict = None, actor_optimizer_params: Dict = None, critic_optimizer_params: Dict = None, actor_scheduler_params: Dict = None, critic_scheduler_params: Dict = None, actor_grad_clip_params: Dict = None, critic_grad_clip_params: Dict = None, actor_tau: float = 1.0, critic_tau: float = 1.0, action_boundaries: tuple = None, **kwargs)[source]¶ Bases:
catalyst.rl.core.algorithm.AlgorithmSpec-
property
gamma¶
-
property
n_step¶
-
classmethod
prepare_for_sampler(env_spec: catalyst.rl.core.environment.EnvironmentSpec, config: Dict) → Union[catalyst.rl.core.agent.ActorSpec, catalyst.rl.core.agent.CriticSpec][source]¶
-
property
-
class
catalyst.rl.offpolicy.algorithms.ddpg.DDPG(actor: catalyst.rl.core.agent.ActorSpec, critic: catalyst.rl.core.agent.CriticSpec, gamma: float, n_step: int, actor_loss_params: Dict = None, critic_loss_params: Dict = None, actor_optimizer_params: Dict = None, critic_optimizer_params: Dict = None, actor_scheduler_params: Dict = None, critic_scheduler_params: Dict = None, actor_grad_clip_params: Dict = None, critic_grad_clip_params: Dict = None, actor_tau: float = 1.0, critic_tau: float = 1.0, action_boundaries: tuple = None, **kwargs)[source]¶ Bases:
catalyst.rl.offpolicy.algorithms.actor_critic.OffpolicyActorCriticSwiss Army knife DDPG algorithm.
-
class
catalyst.rl.offpolicy.algorithms.sac.SAC(actor: catalyst.rl.core.agent.ActorSpec, critic: catalyst.rl.core.agent.CriticSpec, gamma: float, n_step: int, actor_loss_params: Dict = None, critic_loss_params: Dict = None, actor_optimizer_params: Dict = None, critic_optimizer_params: Dict = None, actor_scheduler_params: Dict = None, critic_scheduler_params: Dict = None, actor_grad_clip_params: Dict = None, critic_grad_clip_params: Dict = None, actor_tau: float = 1.0, critic_tau: float = 1.0, action_boundaries: tuple = None, **kwargs)[source]¶ Bases:
catalyst.rl.offpolicy.algorithms.actor_critic.OffpolicyActorCritic
-
class
catalyst.rl.offpolicy.algorithms.td3.TD3(actor: catalyst.rl.core.agent.ActorSpec, critic: catalyst.rl.core.agent.CriticSpec, gamma: float, n_step: int, actor_loss_params: Dict = None, critic_loss_params: Dict = None, actor_optimizer_params: Dict = None, critic_optimizer_params: Dict = None, actor_scheduler_params: Dict = None, critic_scheduler_params: Dict = None, actor_grad_clip_params: Dict = None, critic_grad_clip_params: Dict = None, actor_tau: float = 1.0, critic_tau: float = 1.0, action_boundaries: tuple = None, **kwargs)[source]¶ Bases:
catalyst.rl.offpolicy.algorithms.actor_critic.OffpolicyActorCriticSwiss Army knife TD3 algorithm.
On-policy¶
-
class
catalyst.rl.onpolicy.trainer.OnpolicyTrainer(algorithm: catalyst.rl.core.algorithm.AlgorithmSpec, env_spec: catalyst.rl.core.environment.EnvironmentSpec, db_server: catalyst.rl.core.db.DBSpec, logdir: str, num_workers: int = 1, batch_size: int = 64, min_num_transitions: int = 10000, online_update_period: int = 1, weights_sync_period: int = 1, save_period: int = 10, gc_period: int = 10, seed: int = 42, epoch_limit: int = None, monitoring_params: Dict = None, **kwargs)[source]¶
-
class
catalyst.rl.onpolicy.algorithms.actor.OnpolicyActor(actor: catalyst.rl.core.agent.ActorSpec, gamma: float, n_step: int, actor_loss_params: Dict = None, actor_optimizer_params: Dict = None, actor_scheduler_params: Dict = None, actor_grad_clip_params: Dict = None, **kwargs)[source]¶ Bases:
catalyst.rl.core.algorithm.AlgorithmSpec-
property
gamma¶
-
property
n_step¶
-
classmethod
prepare_for_sampler(env_spec: catalyst.rl.core.environment.EnvironmentSpec, config: Dict) → Union[catalyst.rl.core.agent.ActorSpec, catalyst.rl.core.agent.CriticSpec][source]¶
-
property
-
class
catalyst.rl.onpolicy.algorithms.actor_critic.OnpolicyActorCritic(actor: catalyst.rl.core.agent.ActorSpec, critic: catalyst.rl.core.agent.CriticSpec, gamma: float, n_step: int, actor_loss_params: Dict = None, critic_loss_params: Dict = None, actor_optimizer_params: Dict = None, critic_optimizer_params: Dict = None, actor_scheduler_params: Dict = None, critic_scheduler_params: Dict = None, actor_grad_clip_params: Dict = None, critic_grad_clip_params: Dict = None, **kwargs)[source]¶ Bases:
catalyst.rl.core.algorithm.AlgorithmSpec-
property
gamma¶
-
property
n_step¶
-
classmethod
prepare_for_sampler(env_spec: catalyst.rl.core.environment.EnvironmentSpec, config: Dict) → Union[catalyst.rl.core.agent.ActorSpec, catalyst.rl.core.agent.CriticSpec][source]¶
-
property
-
class
catalyst.rl.onpolicy.algorithms.ppo.PPO(actor: catalyst.rl.core.agent.ActorSpec, critic: catalyst.rl.core.agent.CriticSpec, gamma: float, n_step: int, actor_loss_params: Dict = None, critic_loss_params: Dict = None, actor_optimizer_params: Dict = None, critic_optimizer_params: Dict = None, actor_scheduler_params: Dict = None, critic_scheduler_params: Dict = None, actor_grad_clip_params: Dict = None, critic_grad_clip_params: Dict = None, **kwargs)[source]¶ Bases:
catalyst.rl.onpolicy.algorithms.actor_critic.OnpolicyActorCritic