Language

unilab.algos.torch.him_ppo.algorithm¶

Classes

class unilab.algos.torch.him_ppo.algorithm.HIMPPO[source]¶

Bases: object

Parameters:

num_learning_epochs (int)
num_mini_batches (int)
clip_param (float)
gamma (float)
lam (float)
value_loss_coef (float)
entropy_coef (float)
learning_rate (float)
max_grad_norm (float)
use_clipped_value_loss (bool)
schedule (str)
desired_kl (float | None)
device (str)
kwargs (Any)

__init__(actor_critic, num_learning_epochs=1, num_mini_batches=1, clip_param=0.2, gamma=0.998, lam=0.95, value_loss_coef=1.0, entropy_coef=0.0, learning_rate=0.001, max_grad_norm=1.0, use_clipped_value_loss=True, schedule='fixed', desired_kl=0.01, device='cpu', **kwargs)[source]¶

Parameters:

num_learning_epochs (int)
num_mini_batches (int)
clip_param (float)
gamma (float)
lam (float)
value_loss_coef (float)
entropy_coef (float)
learning_rate (float)
max_grad_norm (float)
use_clipped_value_loss (bool)
schedule (str)
desired_kl (float | None)
device (str)
kwargs (Any)

actor_critic: HIMActorCritic¶

storage: HIMRolloutStorage | None¶

init_storage(num_envs, num_transitions_per_env, actor_obs_shape, critic_obs_shape, action_shape)[source]¶

Parameters:

num_envs (int)
num_transitions_per_env (int)

Return type:

test_mode()[source]¶

Return type:: None

train_mode()[source]¶

Return type:: None

act(obs, critic_obs)[source]¶

Parameters:

obs (Tensor)
critic_obs (Tensor)

Return type:

process_env_step(next_obs, rewards, dones, extras)[source]¶

Parameters:

next_obs (TensorDict | Tensor)
rewards (Tensor)
dones (Tensor)
extras (dict[str, Tensor | TensorDict])

Return type:

compute_returns(last_critic_obs)[source]¶

Parameters:: last_critic_obs (Tensor)
Return type:: None

update()[source]¶

Return type:: tuple[float, float, float, float]