Language

Extending UniLab: New Algorithm¶

Algorithm work must preserve the env, config, and runner contracts. Start with Env Contract, Task Owner Config Contract, and Runner Lifecycle.

Choose The Integration Path¶

Synchronous on-policy examples: scripts/train_rsl_rl.py and scripts/train_mlx_ppo.py.
Async on-policy example: scripts/train_appo.py with APPORunner.
Off-policy examples: scripts/train_offpolicy.py with SAC, TD3, and FlashSAC configs under conf/offpolicy/.

Put reusable learner or runner code under src/unilab/algos/.
Add Hydra config under the owning config root. A new off-policy variant should add conf/offpolicy/algo/<algo>.yaml and matching conf/offpolicy/task/<algo>/<task>/<backend>.yaml owner YAMLs.
If a new top-level training script is required, keep it as assembly: compose Hydra, call ensure_registries(), construct the env through the registry path, then hand control to the runner or trainer.
Keep third-party adapter naming at adapter boundaries. Do not change the internal obs plus optional critic env contract to match a library.
For async algorithms, reuse AsyncRunner, ReplayBuffer or RolloutRingBuffer, and SharedWeightSync instead of creating a new IPC lifecycle.
For off-policy algorithms, keep the CLI --algo <algo> selection aligned with the owner YAML path conf/offpolicy/task/<algo>/<task>/<backend>.yaml; assert_offpolicy_task_choice_matches_algo enforces this guard.

Algorithm unit tests under tests/algos/
IPC tests under tests/ipc/ for async paths
Script/config tests: tests/scripts/test_train_script_configs.py, tests/scripts/test_train_scripts.py