unilab.algos.torch.hora.observations¶
HORA-owned observation helpers for teacher-policy runtime code.
Functions
|
Build the minimal HORA actor TensorDict for APPO play/inference. |
|
Build the HORA PPO/APPO observation TensorDict for teacher-policy runtime. |
Return HORA proprio-history payload from env info when available. |
|
|
Split HORA env outputs into actor obs, critic obs, and privileged info. |
- unilab.algos.torch.hora.observations.split_hora_obs_with_priv_info(obs, info=None)[source]¶
Split HORA env outputs into actor obs, critic obs, and privileged info.
- Parameters:
- Return type:
- Returns:
Tuple
(actor_obs, critic_obs, priv_info).priv_infofalls back to the extra tail ofcritic_obswhen no explicitcritic_infois provided.
- unilab.algos.torch.hora.observations.extract_hora_proprio_hist(info)[source]¶
Return HORA proprio-history payload from env info when available.
- unilab.algos.torch.hora.observations.build_hora_obs_tensordict(obs, *, info, device, batch_size, policy_obs)[source]¶
Build the HORA PPO/APPO observation TensorDict for teacher-policy runtime.
- Parameters:
obs (
dict[str,ndarray]) – Environment observation dict following the UniLab env contract.info (
dict[str,Any] |None) – Optional env info dict containing HORA privileged payloads.device (
str) – Torch device string used for the returned tensors.batch_size (
int) – Number of vectorized environments represented by this batch.policy_obs (
ndarray) – Policy observation array already resolved by the caller.
- Return type:
TensorDict- Returns:
TensorDict with generic keys plus HORA-specific
priv_infoand optionalproprio_histwhen the environment provided them.
- unilab.algos.torch.hora.observations.build_hora_actor_tensordict(actor_obs, *, priv_info, device, batch_size)[source]¶
Build the minimal HORA actor TensorDict for APPO play/inference.
- Parameters:
- Return type:
TensorDict- Returns:
TensorDict containing grouped HORA actor inputs required by teacher-policy inference.