PPO¶
PPO is the default synchronous on-policy training path. It uses
scripts/train_rsl_rl.py, composes from conf/ppo/config.yaml, and runs the
RSL-RL adapter code in src/unilab/algos/torch/rsl_rl_ppo.py and
src/unilab/training/rsl_rl.py.
Quick Start¶
uv run train --algo ppo --task go2_joystick_flat --sim mujoco
uv run train --algo ppo --task go2_joystick_flat --sim motrix training.no_play=true
Common Overrides¶
uv run train --algo ppo --task go2_joystick_flat --sim mujoco \
algo.num_envs=2048 \
algo.max_iterations=300 \
training.no_play=true
Use uv run eval for checkpoint playback:
uv run eval --algo ppo --task go2_joystick_flat --sim mujoco --load-run -1
Logs are grouped by algo.algo_log_name; the default in conf/ppo/config.yaml
is rsl_rl_ppo.