Task Owner Config Contract¶
Task owner YAML is the identity of a composed task/backend/algorithm path. The contract is recorded in ADR-0003 Task Owner And Config Compose Contract.
Owner Paths¶
PPO and APPO owner YAMLs use
conf/{ppo,appo}/task/<task>/<backend>.yaml.MLX PPO composes from
conf/ppo/config_mlx.yamland reuses the PPO task owner YAML layout.Off-policy owner YAMLs include the algorithm dimension:
conf/offpolicy/task/<algo>/<task>/<backend>.yaml.Other existing config roots, such as
conf/ppo_him/andconf/hora_distill/, follow the same owner-YAML identity rule for their supported tasks.
Required Semantics¶
Use public CLI flags to switch backend, for example
uv run train --algo ppo --task go2_joystick_flat --sim mujocooruv run train --algo ppo --task go2_joystick_flat --sim motrix.For off-policy entrypoints, keep
--algo <algo>aligned with the internal owner YAML pathconf/offpolicy/task/<algo>/<task>/<backend>.yaml.training.sim_backendis an identity field inside the selected owner YAML. It is not an independent backend switch.Backend-specific reward, env, scene, and algorithm differences belong in the owner YAML, not in training scripts.
Reward config must be explicitly injected by the owner YAML when the task uses rewards.
Evidence In Repo¶
PPO owner example:
conf/ppo/task/go2_joystick_flat/mujoco.yamlAPPO config root:
conf/appo/config.yamlOff-policy config root:
conf/offpolicy/config.yamlOff-policy task/algo guard:
src/unilab/training/common.pyConfig tests:
tests/config/test_config_system.py,tests/scripts/test_train_script_configs.py,tests/envs/locomotion/g1/test_issue175_regression.py