SAC¶
SAC is selected through the shared off-policy entrypoint
scripts/train_offpolicy.py, which TD3 and FlashSAC share as well. The main
config is conf/offpolicy/config.yaml, and the SAC algorithm defaults live in
conf/offpolicy/algo/sac.yaml. The current log name is fast_sac.
Runtime Model¶
The off-policy runner decouples CPU simulation from GPU learning through shared memory: a collector subprocess fills a CPU-resident replay buffer while the learner trains on the GPU.
Quick Start¶
uv run train --algo sac --task g1_walk_flat --sim mujoco
uv run train --algo sac --task g1_walk_rough --sim motrix training.no_play=true
Key Fields¶
For the off-policy playback path (scripts/train_offpolicy.py / CLI --algo sac),
set training.export_onnx=false to skip policy.onnx export while still recording
playback video. See Evaluation and Playback.
algo.algo_log_name=fast_sacalgo.num_envs=4096algo.max_iterations=500training.use_amp=truein the shared off-policy config
The current runner path in scripts/train_offpolicy.py requires synchronized
collection; training.no_sync_collection=true is rejected by the script.
uv run train --algo sac --task g1_walk_flat --sim mujoco \
algo.num_envs=2048 \
algo.max_iterations=1000 \
training.no_play=true