Evaluation and Playback¶
# Latest run
uv run eval --algo ppo --task go2_joystick_flat --sim motrix --load-run -1
# Headless video export
uv run eval --algo ppo --task go2_joystick_flat --sim motrix \
--load-run -1 --render-mode record
# Off-policy playback can skip ONNX export and still record MP4
uv run eval --algo sac --task g1_walk_flat --sim mujoco --load-run -1 \
--render-mode record training.export_onnx=false
# Demo (downloads checkpoint from HF on first run)
uv run demo dance
Render modes:
interactive— open viewer window (default on macOS Motrix).record— write MP4 toruns/<run>/playback/.none— skip rendering, just compute metrics.
training.export_onnx=false currently applies only to the off-policy playback path
(scripts/train_offpolicy.py and CLI runs with --algo sac|td3|flashsac). It skips
policy.onnx export and verification but still runs playback and video recording.
MuJoCo Viewer Scripts¶
Use uv run eval for regular evaluation and video export. When you need a live
mujoco.viewer window for policy debugging, use the low-level
scripts/play_interactive.py script.
scripts/play_interactive.py is the general MuJoCo viewer entrypoint for PPO,
APPO, SAC, FlashSAC, and HORA distill policies. It uses --algo / --task / --sim to select the algorithm and owner config. The viewer is always
mujoco.viewer; --sim only selects which config to read.
# Use the owner config's interactive.action_mode; the global default is zero action
uv run scripts/play_interactive.py --algo ppo --task go2_joystick_flat --sim mujoco
# Random actions
uv run scripts/play_interactive.py --algo ppo --task go2_joystick_flat --sim mujoco \
interactive.action_mode=random
# Policy actions
uv run scripts/play_interactive.py --algo ppo --task go2_joystick_flat --sim mujoco \
algo.load_run=-1 interactive.action_mode=policy
uv run scripts/play_interactive.py --algo flashsac --task g1_walk_flat --sim motrix \
algo.load_run=-1 interactive.action_mode=policy
uv run scripts/play_interactive.py --algo ppo --task go2_joystick_flat --sim mujoco \
interactive.action_mode=policy interactive.keyboard=true
Select the action source with interactive.action_mode=zero|random|policy. When
omitted, the script uses the owner config setting. The global default is zero,
and some task YAMLs override it to policy.
Enable keyboard control with Hydra overrides: interactive.action_mode=policy interactive.keyboard=true. When keyboard control is enabled, the script checks
that the policy obs contains the velocity command and exits if it does not.
In policy mode, locomotion velocity-command tasks whose policy obs contains
the velocity command automatically show green target-velocity and blue
current-velocity arrows. zero and random modes do not show velocity arrows.
See unilab.visualization.playback for the underlying API.