FlashSAC¶
FlashSAC is the third algorithm on the shared off-policy entrypoint. Select it
with --algo flashsac; defaults live in
conf/offpolicy/algo/flashsac.yaml, and the implementation lives under
src/unilab/algos/torch/flash_sac/.
It shares the off-policy training script with SAC and TD3, but does not use the same default networks: the actor uses a block-based structure and the critic uses a distributional (categorical) Q variant.
Quick Start¶
uv run train --algo flashsac --task g1_walk_flat --sim mujoco
uv run train --algo flashsac --task go2_joystick_flat --sim mujoco training.no_play=true
Key Fields¶
For the off-policy playback path (scripts/train_offpolicy.py / CLI --algo flashsac),
set training.export_onnx=false to skip policy.onnx export while still recording
playback video. See Evaluation and Playback.
algo.algo_log_name=flash_sacalgo.num_envs=1024algo.max_iterations=5000algo.tau=0.01algo.save_interval=1000algo.algo_params.actor_num_blocks=2algo.algo_params.critic_num_blocks=2
scripts/train_offpolicy.py rejects training.num_gpus > 1 for FlashSAC, so
keep the default single-GPU path unless the implementation changes.
The log root is logs/flash_sac/<task>/.