Language

Architecture Overview¶

UniLab is a contract-driven robot learning infrastructure repository. The core rule is to fix behavior at the owner layer that owns the contract.

Runtime Model¶

The async paths use a CPU simulation to accelerator learner pipeline:

CPU physics backend -> collector / IPC -> learner
MuJoCo or Motrix      shared memory       torch or mlx

PPO and MLX PPO are synchronous single-process paths. APPO and off-policy algorithms use the async runner, shared buffers, and weight synchronization primitives under src/unilab/ipc/ and src/unilab/algos/.

Layer Boundaries¶

Layer	Paths	Owns
Backend	`src/unilab/base/backend/`	`SimBackend`, physics state, optional capabilities
Env	`src/unilab/envs/`, `src/unilab/base/np_env.py`	MDP semantics, observation, reward, reset
Config and registry	`conf/`, `src/unilab/base/registry.py`, `src/unilab/structured_configs.py`	Schema, owner YAMLs, env/backend registration
Algorithms and IPC	`src/unilab/algos/`, `src/unilab/ipc/`	Learners, runners, buffers, weight sync
Scripts	`scripts/`, `src/unilab/cli.py`	Thin assembly and CLI routing

Design Rules¶

Keep backend differences in backend implementations, env adapters, and owner YAMLs.
Use uv run train --algo <algo> --task <task> --sim <backend> or uv run eval ... to select the public algorithm/task/backend route. These flags compose the matching owner YAML; training.sim_backend is an identity field.
Prefer config over branching. The escalation order for any extension is config schema -> registry -> env/backend adapter layer -> and only as a last resort a script branch.
Do not parse XML or assets in hot paths such as step, reset, or interval domain randomization.
If shared env code needs a backend operation, add it to SimBackend before using it.
Make evidence-graded claims. Use grades such as Registered, Configured, Tested, Benchmarked, or Recommended; do not claim stable support without evidence in the repo.
Lift reusable primitives. Shared logic belongs in src/unilab/base/ or src/unilab/utils/, not copy-pasted across workflows.
Validate at the closest boundary to the risk: config tests for Hydra changes, env tests for observation/reset changes, IPC tests for runner changes.

Validation¶

Validate near the risk. A top-level smoke run supplements, but does not replace, validation at the boundary a change actually touched.

Change type	Minimum validation
Docs only	`uv run pytest tests/scripts/test_check_docs.py -q`, plus manually verify every support claim against the repo
Hydra / task / reward config	`make test` (`tests/config/`, `tests/scripts/`)
Env contract / observation	`make test` (`tests/base/test_np_env.py` and env tests) plus a 1-iteration smoke run
Runner / IPC	`make test`; add `make test-slow` when needed
Backend path	the matching backend smoke run, plus a slow test when needed
Training entrypoint	the relevant tests plus a 1-iteration smoke run

Use make test for the fast path and make test-all (make check plus make test-cov) before opening a PR.

Review Checklist¶

Which contract did this change touch?
Should this problem be solved at a lower layer?
Is backend or task behavior expressed through config, or hidden by a script special-case?
Is the support claim backed by registry, config, test, or benchmark evidence?
Is validation done at the closest boundary to the risk?

High-Signal Files¶

scripts/train_rsl_rl.py
scripts/train_mlx_ppo.py
scripts/train_appo.py
scripts/train_offpolicy.py
src/unilab/base/np_env.py
src/unilab/base/backend/base.py
src/unilab/base/registry.py
src/unilab/ipc/async_runner.py
src/unilab/training/run.py