Architecture Overview¶
UniLab is a contract-driven robot learning infrastructure repository. The core rule is to fix behavior at the owner layer that owns the contract.
Runtime Model¶
The async paths use a CPU simulation to accelerator learner pipeline:
CPU physics backend -> collector / IPC -> learner
MuJoCo or Motrix shared memory torch or mlx
PPO and MLX PPO are synchronous single-process paths. APPO and off-policy
algorithms use the async runner, shared buffers, and weight synchronization
primitives under src/unilab/ipc/ and src/unilab/algos/.
Layer Boundaries¶
Layer |
Paths |
Owns |
|---|---|---|
Backend |
|
|
Env |
|
MDP semantics, observation, reward, reset |
Config and registry |
|
Schema, owner YAMLs, env/backend registration |
Algorithms and IPC |
|
Learners, runners, buffers, weight sync |
Scripts |
|
Thin assembly and CLI routing |
Design Rules¶
Keep backend differences in backend implementations, env adapters, and owner YAMLs.
Use
uv run train --algo <algo> --task <task> --sim <backend>oruv run eval ...to select the public algorithm/task/backend route. These flags compose the matching owner YAML;training.sim_backendis an identity field.Prefer config over branching. The escalation order for any extension is config schema -> registry -> env/backend adapter layer -> and only as a last resort a script branch.
Do not parse XML or assets in hot paths such as
step,reset, or interval domain randomization.If shared env code needs a backend operation, add it to
SimBackendbefore using it.Make evidence-graded claims. Use grades such as
Registered,Configured,Tested,Benchmarked, orRecommended; do not claim stable support without evidence in the repo.Lift reusable primitives. Shared logic belongs in
src/unilab/base/orsrc/unilab/utils/, not copy-pasted across workflows.Validate at the closest boundary to the risk: config tests for Hydra changes, env tests for observation/reset changes, IPC tests for runner changes.
Validation¶
Validate near the risk. A top-level smoke run supplements, but does not replace, validation at the boundary a change actually touched.
Change type |
Minimum validation |
|---|---|
Docs only |
|
Hydra / task / reward config |
|
Env contract / observation |
|
Runner / IPC |
|
Backend path |
the matching backend smoke run, plus a slow test when needed |
Training entrypoint |
the relevant tests plus a 1-iteration smoke run |
Use make test for the fast path and make test-all (make check plus
make test-cov) before opening a PR.
Review Checklist¶
Which contract did this change touch?
Should this problem be solved at a lower layer?
Is backend or task behavior expressed through config, or hidden by a script special-case?
Is the support claim backed by registry, config, test, or benchmark evidence?
Is validation done at the closest boundary to the risk?
High-Signal Files¶
scripts/train_rsl_rl.pyscripts/train_mlx_ppo.pyscripts/train_appo.pyscripts/train_offpolicy.pysrc/unilab/base/np_env.pysrc/unilab/base/backend/base.pysrc/unilab/base/registry.pysrc/unilab/ipc/async_runner.pysrc/unilab/training/run.py