unilab.algos.torch.hora.rsl_rl

HORA-owned RSL-RL wrapper helpers for teacher-policy runtime.

Functions

resolve_hora_ppo_runtime(rl_cfg)

Resolve HORA PPO entrypoints from an explicit runtime marker.

resolve_hora_ppo_wrapper_cls(rl_cfg)

Return the HORA-specific PPO wrapper class when the config selects it.

Classes

HoraRslRlPPORuntime

Resolved HORA PPO runtime consumed by the generic RSL-RL script.

HoraRslRlVecEnvWrapper

RSL-RL adapter that preserves HORA teacher-policy observation payloads.

class unilab.algos.torch.hora.rsl_rl.HoraRslRlPPORuntime[source]

Bases: object

Resolved HORA PPO runtime consumed by the generic RSL-RL script.

Parameters:

wrapper_cls (type[RslRlVecEnvWrapper])

wrapper_cls: type[RslRlVecEnvWrapper]
__init__(wrapper_cls)
Parameters:

wrapper_cls (type[RslRlVecEnvWrapper])

unilab.algos.torch.hora.rsl_rl.resolve_hora_ppo_runtime(rl_cfg)[source]

Resolve HORA PPO entrypoints from an explicit runtime marker.

Parameters:

rl_cfg (dict[str, Any])

Return type:

HoraRslRlPPORuntime | None

unilab.algos.torch.hora.rsl_rl.resolve_hora_ppo_wrapper_cls(rl_cfg)[source]

Return the HORA-specific PPO wrapper class when the config selects it.

Parameters:

rl_cfg (dict[str, Any]) – Resolved algorithm config dictionary from Hydra composition.

Return type:

type[RslRlVecEnvWrapper] | None

Returns:

HoraRslRlVecEnvWrapper when the owner config selects HORA PPO, otherwise None.

class unilab.algos.torch.hora.rsl_rl.HoraRslRlVecEnvWrapper[source]

Bases: RslRlVecEnvWrapper

RSL-RL adapter that preserves HORA teacher-policy observation payloads.

Parameters:
  • env (Any)

  • device (str)

  • policy_obs_mode (str)

step(actions)[source]

Step the wrapped env while keeping HORA bootstrap payloads intact.

Parameters:

actions (Tensor | ndarray) – Torch or numpy action batch with shape (num_envs, action_dim).

Return type:

tuple[TensorDict, Tensor, Tensor, dict]

Returns:

Tuple (obs_td, rewards, dones, infos) matching the RSL-RL VecEnv contract while preserving HORA privileged observations.

reset()[source]

Reset the wrapped env and preserve HORA privileged reset payloads.

Parameters:

None.

Return type:

tuple[TensorDict, dict[str, Any]]

Returns:

Tuple (obs_td, info) where obs_td retains HORA privileged inputs.

get_observations()[source]

Return the current HORA-aware observation TensorDict.

Parameters:

None.

Return type:

TensorDict

Returns:

TensorDict containing the current observation batch with HORA extras.