Language

`unilab.ipc` — Shared-Memory Runtime¶

The bridge between CPU simulation workers and the GPU learner. Everything here is a building block of the async runner that powers APPO / FastSAC / FastTD3 / FlashSAC.

Submodule	Role
`async_runner`	The high-level orchestration loop
`shared_buffer`	NumPy-backed shared-memory ring/buffer
`rollout_ring_buffer`	Rollout window used by on-policy collectors
`replay_buffer`	Off-policy replay backed by shared memory
`replay_pipelines.*`	Host-to-device staging (CPU-pinned double buffer, native h2d)
`shared_obs_stats`	Running mean/std shared across workers
`weight_sync`	Push learner weights back to workers

unilab.ipc

IPC primitives for multi-process RL training.

Async runner¶

Base async runner for multi-process RL training.

class unilab.ipc.async_runner.AsyncRunner[source]¶

Bases: ABC

Base class for async RL algorithms.

Manages: - Shared memory allocation/cleanup - Collector process lifecycle - Error propagation from collector subprocess - Training loop skeleton

Parameters:

env_name (str)
env_cfg_overrides (dict)
rl_cfg (dict)
device (str | None)
collector_device (str | None)
sim_backend (str)
num_envs (int)

__init__(env_name, env_cfg_overrides, rl_cfg, *, device=None, collector_device=None, sim_backend='mujoco', num_envs=4096)[source]¶

Parameters:

env_name (str)
env_cfg_overrides (dict)
rl_cfg (dict)
device (str | None)
collector_device (str | None)
sim_backend (str)
num_envs (int)

abstract learn(max_iterations, save_interval=50, log_dir='logs')[source]¶

Parameters:

max_iterations (int)
save_interval (int)
log_dir (str)

Return type:

None

close()[source]¶

Return type:: None

unilab.ipc — Shared-Memory Runtime¶

Async runner¶

`unilab.ipc` — Shared-Memory Runtime¶