unilab.algos.torch.hora.observations

HORA-owned observation helpers for teacher-policy runtime code.

Functions

build_hora_actor_tensordict(actor_obs, *, ...)

Build the minimal HORA actor TensorDict for APPO play/inference.

build_hora_obs_tensordict(obs, *, info, ...)

Build the HORA PPO/APPO observation TensorDict for teacher-policy runtime.

extract_hora_proprio_hist(info)

Return HORA proprio-history payload from env info when available.

split_hora_obs_with_priv_info(obs[, info])

Split HORA env outputs into actor obs, critic obs, and privileged info.

unilab.algos.torch.hora.observations.split_hora_obs_with_priv_info(obs, info=None)[source]

Split HORA env outputs into actor obs, critic obs, and privileged info.

Parameters:
  • obs (dict[str, ndarray]) – Environment observation dict following the UniLab env contract.

  • info (dict[str, Any] | None) – Optional env info dict. When present, info["critic_info"] is the preferred source of HORA privileged info.

Return type:

tuple[ndarray, ndarray | None, ndarray | None]

Returns:

Tuple (actor_obs, critic_obs, priv_info). priv_info falls back to the extra tail of critic_obs when no explicit critic_info is provided.

unilab.algos.torch.hora.observations.extract_hora_proprio_hist(info)[source]

Return HORA proprio-history payload from env info when available.

Parameters:

info (dict[str, Any] | None) – Optional env info dict produced by the HORA environment.

Return type:

ndarray | None

Returns:

Proprio-history array when present, otherwise None.

unilab.algos.torch.hora.observations.build_hora_obs_tensordict(obs, *, info, device, batch_size, policy_obs)[source]

Build the HORA PPO/APPO observation TensorDict for teacher-policy runtime.

Parameters:
  • obs (dict[str, ndarray]) – Environment observation dict following the UniLab env contract.

  • info (dict[str, Any] | None) – Optional env info dict containing HORA privileged payloads.

  • device (str) – Torch device string used for the returned tensors.

  • batch_size (int) – Number of vectorized environments represented by this batch.

  • policy_obs (ndarray) – Policy observation array already resolved by the caller.

Return type:

TensorDict

Returns:

TensorDict with generic keys plus HORA-specific priv_info and optional proprio_hist when the environment provided them.

unilab.algos.torch.hora.observations.build_hora_actor_tensordict(actor_obs, *, priv_info, device, batch_size)[source]

Build the minimal HORA actor TensorDict for APPO play/inference.

Parameters:
  • actor_obs (ndarray) – Actor observation array with shape (batch, obs_dim).

  • priv_info (ndarray) – Privileged-info array with shape (batch, priv_dim).

  • device (str) – Torch device string used for the returned tensors.

  • batch_size (int) – Number of vectorized environments represented by this batch.

Return type:

TensorDict

Returns:

TensorDict containing grouped HORA actor inputs required by teacher-policy inference.