TD3

TD3 shares the off-policy training script with SAC and FlashSAC. Select it with --algo td3; owner YAML evidence lives under conf/offpolicy/task/td3/.

Quick Start

uv run train --algo td3 --task g1_walk_flat --sim mujoco

Key Fields

For the off-policy playback path (scripts/train_offpolicy.py / CLI --algo td3), set training.export_onnx=false to skip policy.onnx export while still recording playback video. See Evaluation and Playback.

  • Defaults live in conf/offpolicy/algo/td3.yaml.

  • algo.algo_log_name=fast_td3.

  • algo.max_iterations=5000.

  • algo.policy_frequency=2.

Use --task and --sim to select task and backend; do not reuse a SAC owner with --algo td3.

uv run train --algo td3 --task g1_walk_flat --sim mujoco \
  algo.num_envs=2048 \
  training.no_play=true

When to Prefer TD3

  • A task owner has already tuned hyperparameters specifically for TD3.

  • You want a same-task comparison against SAC.

  • You want to keep the same off-policy training stack but switch to a TD3 owner.

The log root is logs/fast_td3/<task>/.