Language

TD3¶

TD3 shares the off-policy training script with SAC and FlashSAC. Select it with --algo td3; owner YAML evidence lives under conf/offpolicy/task/td3/.

Quick Start¶

uv run train --algo td3 --task g1_walk_flat --sim mujoco

Key Fields¶

For the off-policy playback path (scripts/train_offpolicy.py / CLI --algo td3), set training.export_onnx=false to skip policy.onnx export while still recording playback video. See Evaluation and Playback.

Defaults live in conf/offpolicy/algo/td3.yaml.
algo.algo_log_name=fast_td3.
algo.max_iterations=5000.
algo.policy_frequency=2.

Use --task and --sim to select task and backend; do not reuse a SAC owner with --algo td3.

uv run train --algo td3 --task g1_walk_flat --sim mujoco \
  algo.num_envs=2048 \
  training.no_play=true

When to Prefer TD3¶

A task owner has already tuned hyperparameters specifically for TD3.
You want a same-task comparison against SAC.
You want to keep the same off-policy training stack but switch to a TD3 owner.

The log root is logs/fast_td3/<task>/.