TD3¶
TD3 shares the off-policy training script with SAC and FlashSAC. Select it
with --algo td3; owner YAML evidence lives under conf/offpolicy/task/td3/.
Quick Start¶
uv run train --algo td3 --task g1_walk_flat --sim mujoco
Key Fields¶
For the off-policy playback path (scripts/train_offpolicy.py / CLI --algo td3),
set training.export_onnx=false to skip policy.onnx export while still recording
playback video. See Evaluation and Playback.
Defaults live in
conf/offpolicy/algo/td3.yaml.algo.algo_log_name=fast_td3.algo.max_iterations=5000.algo.policy_frequency=2.
Use --task and --sim to select task and backend; do not reuse a SAC owner
with --algo td3.
uv run train --algo td3 --task g1_walk_flat --sim mujoco \
algo.num_envs=2048 \
training.no_play=true
When to Prefer TD3¶
A task owner has already tuned hyperparameters specifically for TD3.
You want a same-task comparison against SAC.
You want to keep the same off-policy training stack but switch to a TD3 owner.
The log root is logs/fast_td3/<task>/.