unilab.algos.torch.him_ppo.algorithm

Classes

class unilab.algos.torch.him_ppo.algorithm.HIMPPO[source]

Bases: object

Parameters:
__init__(actor_critic, num_learning_epochs=1, num_mini_batches=1, clip_param=0.2, gamma=0.998, lam=0.95, value_loss_coef=1.0, entropy_coef=0.0, learning_rate=0.001, max_grad_norm=1.0, use_clipped_value_loss=True, schedule='fixed', desired_kl=0.01, device='cpu', **kwargs)[source]
Parameters:
actor_critic: HIMActorCritic
storage: HIMRolloutStorage | None
init_storage(num_envs, num_transitions_per_env, actor_obs_shape, critic_obs_shape, action_shape)[source]
Parameters:
  • num_envs (int)

  • num_transitions_per_env (int)

Return type:

None

test_mode()[source]
Return type:

None

train_mode()[source]
Return type:

None

act(obs, critic_obs)[source]
Parameters:
Return type:

Tensor

process_env_step(next_obs, rewards, dones, extras)[source]
Parameters:
Return type:

None

compute_returns(last_critic_obs)[source]
Parameters:

last_critic_obs (Tensor)

Return type:

None

update()[source]
Return type:

tuple[float, float, float, float]