Language

unilab.algos.torch.hora.observations¶

HORA-owned observation helpers for teacher-policy runtime code.

Functions

`build_hora_actor_tensordict`(actor_obs, *, ...)	Build the minimal HORA actor TensorDict for APPO play/inference.
`build_hora_obs_tensordict`(obs, *, info, ...)	Build the HORA PPO/APPO observation TensorDict for teacher-policy runtime.
`extract_hora_proprio_hist`(info)	Return HORA proprio-history payload from env info when available.
`split_hora_obs_with_priv_info`(obs[, info])	Split HORA env outputs into actor obs, critic obs, and privileged info.

unilab.algos.torch.hora.observations.split_hora_obs_with_priv_info(obs, info=None)[source]¶

Split HORA env outputs into actor obs, critic obs, and privileged info.

Parameters:

obs (dict[str, ndarray]) – Environment observation dict following the UniLab env contract.
info (dict[str, Any] | None) – Optional env info dict. When present, info["critic_info"] is the preferred source of HORA privileged info.

Return type:

tuple[ndarray, ndarray | None, ndarray | None]

Returns:

Tuple (actor_obs, critic_obs, priv_info). priv_info falls back to the extra tail of critic_obs when no explicit critic_info is provided.

unilab.algos.torch.hora.observations.extract_hora_proprio_hist(info)[source]¶

Return HORA proprio-history payload from env info when available.

Parameters:: info (dict[str, Any] | None) – Optional env info dict produced by the HORA environment.
Return type:: ndarray | None
Returns:: Proprio-history array when present, otherwise None.

unilab.algos.torch.hora.observations.build_hora_obs_tensordict(obs, *, info, device, batch_size, policy_obs)[source]¶

Build the HORA PPO/APPO observation TensorDict for teacher-policy runtime.

Parameters:

obs (dict[str, ndarray]) – Environment observation dict following the UniLab env contract.
info (dict[str, Any] | None) – Optional env info dict containing HORA privileged payloads.
device (str) – Torch device string used for the returned tensors.
batch_size (int) – Number of vectorized environments represented by this batch.
policy_obs (ndarray) – Policy observation array already resolved by the caller.

Return type:

TensorDict

Returns:

TensorDict with generic keys plus HORA-specific priv_info and optional proprio_hist when the environment provided them.

unilab.algos.torch.hora.observations.build_hora_actor_tensordict(actor_obs, *, priv_info, device, batch_size)[source]¶

Build the minimal HORA actor TensorDict for APPO play/inference.

Parameters:

actor_obs (ndarray) – Actor observation array with shape (batch, obs_dim).
priv_info (ndarray) – Privileged-info array with shape (batch, priv_dim).
device (str) – Torch device string used for the returned tensors.
batch_size (int) – Number of vectorized environments represented by this batch.

Return type:

TensorDict

Returns:

TensorDict containing grouped HORA actor inputs required by teacher-policy inference.