Extending UniLab: New Algorithm¶
Algorithm work must preserve the env, config, and runner contracts. Start with Env Contract, Task Owner Config Contract, and Runner Lifecycle.
Choose The Integration Path¶
Synchronous on-policy examples:
scripts/train_rsl_rl.pyandscripts/train_mlx_ppo.py.Async on-policy example:
scripts/train_appo.pywithAPPORunner.Off-policy examples:
scripts/train_offpolicy.pywith SAC, TD3, and FlashSAC configs underconf/offpolicy/.
Implementation Checklist¶
Put reusable learner or runner code under
src/unilab/algos/.Add Hydra config under the owning config root. A new off-policy variant should add
conf/offpolicy/algo/<algo>.yamland matchingconf/offpolicy/task/<algo>/<task>/<backend>.yamlowner YAMLs.If a new top-level training script is required, keep it as assembly: compose Hydra, call
ensure_registries(), construct the env through the registry path, then hand control to the runner or trainer.Keep third-party adapter naming at adapter boundaries. Do not change the internal
obsplus optionalcriticenv contract to match a library.For async algorithms, reuse
AsyncRunner,ReplayBufferorRolloutRingBuffer, andSharedWeightSyncinstead of creating a new IPC lifecycle.For off-policy algorithms, keep the CLI
--algo <algo>selection aligned with the owner YAML pathconf/offpolicy/task/<algo>/<task>/<backend>.yaml;assert_offpolicy_task_choice_matches_algoenforces this guard.
Validation Near Risk¶
Algorithm unit tests under
tests/algos/IPC tests under
tests/ipc/for async pathsScript/config tests:
tests/scripts/test_train_script_configs.py,tests/scripts/test_train_scripts.py
Evidence In Repo¶
Structured config dataclasses:
src/unilab/structured_configs.pyTraining helpers:
src/unilab/training/common.py,src/unilab/training/run.pyExisting algorithm packages:
src/unilab/algos/torch/,src/unilab/algos/mlx/