Language

unilab.algos.torch.hora.rsl_rl¶

HORA-owned RSL-RL wrapper helpers for teacher-policy runtime.

Functions

`resolve_hora_ppo_runtime`(rl_cfg)	Resolve HORA PPO entrypoints from an explicit runtime marker.
`resolve_hora_ppo_wrapper_cls`(rl_cfg)	Return the HORA-specific PPO wrapper class when the config selects it.

Classes

`HoraRslRlPPORuntime`	Resolved HORA PPO runtime consumed by the generic RSL-RL script.
`HoraRslRlVecEnvWrapper`	RSL-RL adapter that preserves HORA teacher-policy observation payloads.

class unilab.algos.torch.hora.rsl_rl.HoraRslRlPPORuntime[source]¶

Resolved HORA PPO runtime consumed by the generic RSL-RL script.

__init__(wrapper_cls)¶

unilab.algos.torch.hora.rsl_rl.resolve_hora_ppo_runtime(rl_cfg)[source]¶

Resolve HORA PPO entrypoints from an explicit runtime marker.

unilab.algos.torch.hora.rsl_rl.resolve_hora_ppo_wrapper_cls(rl_cfg)[source]¶

Return the HORA-specific PPO wrapper class when the config selects it.

Parameters:: rl_cfg (dict[str, Any]) – Resolved algorithm config dictionary from Hydra composition.
Return type:: type[RslRlVecEnvWrapper] | None
Returns:: HoraRslRlVecEnvWrapper when the owner config selects HORA PPO, otherwise None.

class unilab.algos.torch.hora.rsl_rl.HoraRslRlVecEnvWrapper[source]¶

RSL-RL adapter that preserves HORA teacher-policy observation payloads.

Parameters:

step(actions)[source]¶

Step the wrapped env while keeping HORA bootstrap payloads intact.

Parameters:: actions (Tensor | ndarray) – Torch or numpy action batch with shape (num_envs, action_dim).
Return type:: tuple[TensorDict, Tensor, Tensor, dict]
Returns:: Tuple (obs_td, rewards, dones, infos) matching the RSL-RL VecEnv contract while preserving HORA privileged observations.

Reset the wrapped env and preserve HORA privileged reset payloads.

Parameters:: None.
Return type:: tuple[TensorDict, dict[str, Any]]
Returns:: Tuple (obs_td, info) where obs_td retains HORA privileged inputs.

get_observations()[source]¶

Return the current HORA-aware observation TensorDict.

Parameters:: None.
Return type:: TensorDict
Returns:: TensorDict containing the current observation batch with HORA extras.