unilab.ipc¶
IPC primitives for multi-process RL training.
Bases:
objectSynchronize actor weights between learner and collector.
- Return type:
- Return type:
- Return type:
- Return type:
- class unilab.ipc.RolloutRingBuffer[source]¶
Bases:
objectN-slot shared-memory ring buffer for raw rollout payloads.
- Parameters:
- __init__(num_envs, num_steps, obs_dim, action_dim, *, critic_dim=0, num_slots=4, create=True, shm_name_prefix=None)[source]¶
- class unilab.ipc.AsyncRunner[source]¶
Bases:
ABCBase class for async RL algorithms.
Manages: - Shared memory allocation/cleanup - Collector process lifecycle - Error propagation from collector subprocess - Training loop skeleton
- Parameters:
- __init__(env_name, env_cfg_overrides, rl_cfg, *, device=None, collector_device=None, sim_backend='mujoco', num_envs=4096)[source]¶
Bases:
objectSynchronize observation normalization statistics between learner and collector.
Uses a queue to pass (mean, std) tuples from learner to collector.
Put new stats, clearing old ones first.
Get latest stats, returns None if no new stats.
- class unilab.ipc.ReplayBuffer[source]¶
Bases:
SharedBufferBaseShared replay buffer backed by authoritative packed CPU storage.
Device transfer is owned by replay pipeline transfer backends. The fallback sample() path copies a sampled packed batch to
self.deviceand keeps no per-device replay cache.- Parameters:
- __init__(capacity, obs_dim, action_dim, device, defer_gpu=False, critic_dim=0, packed_cpu_storage=False)[source]¶
- add(obs, actions, rewards, next_obs, dones, truncated, terminal_mask=None, terminal_next_obs=None, critic=None, next_critic=None, terminal_next_critic=None)[source]¶
Add batch (called by collector).
dones follows the UniLab env lifecycle contract: done = terminated | truncated. Learners must pair it with truncated when computing bootstrap masks.
Modules
Base async runner for multi-process RL training. |
|
Cross-process error propagation for collector subprocesses. |
|
Memory budget estimation for async RL training buffers. |
|
Packed shared-memory replay buffer for off-policy RL. |
|
Replay pipeline abstraction. |
|
Shared rollout IPC ring buffer for APPO / async PPO. |
|
Base class for device-adaptive shared memory buffers. |
|
Shared observation normalization statistics for multi-process training. |
|
Shared weight synchronization for actor networks. |