unilab.algos.torch.rsl_rl_ppo

Classes

FinalObservationAwarePPO

PPO variant that bootstraps time limits from env final_observation.

class unilab.algos.torch.rsl_rl_ppo.FinalObservationAwarePPO[source]

Bases: PPO

PPO variant that bootstraps time limits from env final_observation.

Parameters:
learning_rate: float
__init__(*args, enable_compile=False, **kwargs)[source]
Parameters:
update()[source]
Return type:

dict[str, float]

process_env_step(obs, rewards, dones, extras)[source]
Parameters:
Return type:

None