APPO¶
APPO is UniLab’s asynchronous PPO path. It uses scripts/train_appo.py,
conf/appo/config.yaml, and the runtime under src/unilab/algos/torch/appo/.
The config exposes algo.steps_per_env, training.collector_device, and
training.replay_queue_size; the algorithm config includes V-trace clipping
fields.
Quick Start¶
uv run train --algo appo --task go2_joystick_flat --sim mujoco
uv run train --algo appo --task g1_motion_tracking --sim motrix training.no_play=true
Common Overrides¶
uv run train --algo appo --task go2_joystick_flat --sim mujoco \
algo.num_envs=2048 \
algo.max_iterations=300 \
training.replay_queue_size=2
Playback and checkpoint selection use uv run eval:
uv run eval --algo appo --task go2_joystick_flat --sim mujoco --load-run -1
Runtime Model¶
The collector runs CPU simulation while the learner runs GPU training.
Rollouts are published into a replay queue that the learner consumes.
APPO applies a V-trace importance-sampling correction, so its update semantics differ from synchronous PPO.
The collector/learner pipeline is backed by a 4-slot ring buffer.
Key Fields¶
algo.steps_per_env: rollout length per environment.training.replay_queue_size: learner-side cache depth.training.collector_device: collector device; defaults to following the learner.algo.save_interval: checkpoint save interval.
The default log root is logs/appo/<task>/, from algo.algo_log_name=appo
in conf/appo/config.yaml.