unilab.algos.torch.hora.appo_worker

HORA-owned APPO rollout worker.

Functions

compute_hora_timeout_bootstrap_correction(...)

Compute timeout bootstrap values for grouped HORA observations.

hora_appo_collector_fn(stop_event, env_name, ...)

Collect grouped HORA APPO rollouts into the shared IPC ring buffer.

unilab.algos.torch.hora.appo_worker.compute_hora_timeout_bootstrap_correction(critic, collector_device, gamma, timeout_mask, final_obs, final_critic=None, final_priv_info=None)[source]

Compute timeout bootstrap values for grouped HORA observations.

Parameters:
Return type:

ndarray

unilab.algos.torch.hora.appo_worker.hora_appo_collector_fn(stop_event, env_name, rl_cfg, num_envs, steps_per_env, shm_rollout_ring_buffer_name, sync_primitives, obs_dim, action_dim, critic_dim, actor_weight_sync_name, actor_weight_param_shapes, critic_weight_sync_name, critic_weight_param_shapes, metrics_queue, collector_device='cpu', sim_backend='mujoco', env_cfg_override=None, priv_info_dim=0, seed=None)[source]

Collect grouped HORA APPO rollouts into the shared IPC ring buffer.

Parameters:
  • stop_event (Any)

  • env_name (str)

  • rl_cfg (dict)

  • num_envs (int)

  • steps_per_env (int)

  • shm_rollout_ring_buffer_name (Dict[str, str])

  • sync_primitives (tuple)

  • obs_dim (int)

  • action_dim (int)

  • critic_dim (int)

  • actor_weight_sync_name (str)

  • actor_weight_param_shapes (dict)

  • critic_weight_sync_name (str)

  • critic_weight_param_shapes (dict)

  • metrics_queue (Any)

  • collector_device (str)

  • sim_backend (str)

  • env_cfg_override (dict | None)

  • priv_info_dim (int)

  • seed (int | None)