Runner Lifecycle¶
Runner code owns training lifecycle. Scripts compose Hydra config and start the right runner; they should not create a second collector/learner protocol.
Runtime-Specific Owners¶
scripts/train_rsl_rl.pyusesRslRlVecEnvWrapperand RSL-RL’sOnPolicyRunner.scripts/train_mlx_ppo.pyuses the MLX PPO trainer path.scripts/train_appo.pyusesAPPORunner,RolloutRingBuffer, andSharedWeightSync.scripts/train_offpolicy.pyuses off-policy runners withReplayBufferandSharedWeightSync.AsyncRunnerowns collector process lifecycle and shared-resource cleanup for async runners.
Rules¶
Do not bypass
AsyncRunner.close()semantics for async collectors.Do not patch env observation or critic semantics inside runner code; preserve the
obsplus optionalcriticcontract.Use
src/unilab/training/run.pyfor shared log-root, checkpoint, and playback resolution helpers instead of copying those rules into scripts.
Evidence In Repo¶
Shared training helpers:
src/unilab/training/common.py,src/unilab/training/run.pyAsync lifecycle:
src/unilab/ipc/async_runner.pyRunner tests:
tests/algos/test_appo_runner.py,tests/algos/test_offpolicy_runner.py,tests/ipc/test_async_runner.py