Multi-GPU

The current multi-GPU knob lives in the shared off-policy training config as training.num_gpus. The field is consumed by the off-policy and FlashSAC paths; PPO, MLX PPO, and APPO do not expose the same multi-GPU contract.

uv run train --algo sac --task g1_walk_flat --sim mujoco \
  training.num_gpus=2 \
  training.no_play=true

Keep task and backend selection in --task and --sim:

uv run train --algo td3 --task g1_walk_flat --sim mujoco \
  training.num_gpus=2

When changing multi-GPU behavior, validate near the off-policy runner and IPC boundary rather than only checking a top-level command.