Language

`unilab.algos.torch`¶

`unilab.algos.torch.common`
`unilab.algos.torch.appo`
`unilab.algos.torch.fast_sac`
`unilab.algos.torch.fast_td3`
`unilab.algos.torch.flash_sac`	FlashSAC algorithm package.
`unilab.algos.torch.him_ppo`
`unilab.algos.torch.hora`
`unilab.algos.torch.offpolicy`	Off-policy RL unified infrastructure.

Standalone PPO entrypoints¶

class unilab.algos.torch.rsl_rl_ppo.FinalObservationAwarePPO[source]¶

Bases: PPO

PPO variant that bootstraps time limits from env final_observation.

Parameters:

args (Any)
enable_compile (bool)
kwargs (Any)

learning_rate: float¶

__init__(*args, enable_compile=False, **kwargs)[source]¶

Parameters:

args (Any)
enable_compile (bool)
kwargs (Any)

update()[source]¶

Return type:: dict[str, float]

process_env_step(obs, rewards, dones, extras)[source]¶

Parameters:

obs (TensorDict)
rewards (Tensor)
dones (Tensor)
extras (dict[str, Tensor | TensorDict])

Return type:

Runtime resolution helpers for RSL-RL PPO script assembly.

class unilab.algos.torch.rsl_rl_runtime.RslRlPPORuntime[source]¶

Bases: object

Resolved PPO runtime consumed by the generic RSL-RL entrypoint.

Parameters:: wrapper_cls (type[RslRlVecEnvWrapper])

wrapper_cls: type[RslRlVecEnvWrapper]¶

__init__(wrapper_cls)¶

Parameters:: wrapper_cls (type[RslRlVecEnvWrapper])

unilab.algos.torch.rsl_rl_runtime.resolve_rsl_rl_ppo_runtime(rl_cfg, *, default_wrapper_cls)[source]¶

Resolve the PPO runtime bundle from owner config.

Parameters:

rl_cfg (dict[str, Any])
default_wrapper_cls (type[RslRlVecEnvWrapper])

Return type:

RslRlPPORuntime