unilab.algos.torch.hora.ppo.HoraPPO¶
- class unilab.algos.torch.hora.ppo.HoraPPO[source]¶
Bases:
FinalObservationAwarePPOPPO variant that constructs a shared HORA actor-critic backbone.
- Parameters:
actor (
HoraActorModel)critic (
HoraCriticModel)storage (
RolloutStorage)num_learning_epochs (
int)num_mini_batches (
int)clip_param (
float)gamma (
float)lam (
float)value_loss_coef (
float)entropy_coef (
float)learning_rate (
float)max_grad_norm (
float)optimizer (
str)use_clipped_value_loss (
bool)schedule (
str)desired_kl (
float)normalize_advantage_per_mini_batch (
bool)device (
str)enable_compile (
bool)
Methods
__init__(actor, critic, storage[, ...])construct_algorithm(obs, env, cfg, device)process_env_step(obs, rewards, dones, extras)update()Attributes
- __init__(actor, critic, storage, num_learning_epochs=5, num_mini_batches=4, clip_param=0.2, gamma=0.99, lam=0.95, value_loss_coef=1.0, entropy_coef=0.01, learning_rate=0.001, max_grad_norm=1.0, optimizer='adam', use_clipped_value_loss=True, schedule='adaptive', desired_kl=0.01, normalize_advantage_per_mini_batch=False, device='cpu', rnd_cfg=None, symmetry_cfg=None, multi_gpu_cfg=None, enable_compile=False)[source]¶
- Parameters:
actor (
HoraActorModel)critic (
HoraCriticModel)storage (
RolloutStorage)num_learning_epochs (
int)num_mini_batches (
int)clip_param (
float)gamma (
float)lam (
float)value_loss_coef (
float)entropy_coef (
float)learning_rate (
float)max_grad_norm (
float)optimizer (
str)use_clipped_value_loss (
bool)schedule (
str)desired_kl (
float)normalize_advantage_per_mini_batch (
bool)device (
str)enable_compile (
bool)
- __call__(*args, **kwargs)¶
Call self as a function.