unilab.algos.torch.rsl_rl_ppo.FinalObservationAwarePPO¶
- class unilab.algos.torch.rsl_rl_ppo.FinalObservationAwarePPO[source]¶
Bases:
PPOPPO variant that bootstraps time limits from env final_observation.
Methods
__init__(*args[, enable_compile])process_env_step(obs, rewards, dones, extras)update()Attributes
- learning_rate: float¶
- __call__(*args, **kwargs)¶
Call self as a function.