unilab.algos.torch.hora.ppo.HoraPPO

class unilab.algos.torch.hora.ppo.HoraPPO[source]

Bases: FinalObservationAwarePPO

PPO variant that constructs a shared HORA actor-critic backbone.

Parameters:

Methods

__init__(actor, critic, storage[, ...])

construct_algorithm(obs, env, cfg, device)

process_env_step(obs, rewards, dones, extras)

update()

Attributes

__init__(actor, critic, storage, num_learning_epochs=5, num_mini_batches=4, clip_param=0.2, gamma=0.99, lam=0.95, value_loss_coef=1.0, entropy_coef=0.01, learning_rate=0.001, max_grad_norm=1.0, optimizer='adam', use_clipped_value_loss=True, schedule='adaptive', desired_kl=0.01, normalize_advantage_per_mini_batch=False, device='cpu', rnd_cfg=None, symmetry_cfg=None, multi_gpu_cfg=None, enable_compile=False)[source]
Parameters:
__call__(*args, **kwargs)

Call self as a function.

Parameters:
Return type:

Any

__getitem__(key)
Parameters:

key (Any)

Return type:

_MockObject

__len__()
Return type:

int

static __new__(cls, *args, **kwargs)
Parameters:
Return type:

Any

update()
Return type:

dict[str, float]

learning_rate: float
static construct_algorithm(obs, env, cfg, device)[source]
Parameters:
  • obs (TensorDict)

  • env (VecEnv)

  • cfg (dict)

  • device (str)

Return type:

PPO

process_env_step(obs, rewards, dones, extras)[source]
Parameters:
Return type:

None