unilab.algos.torch.fast_td3.runner.FastTD3Runner

class unilab.algos.torch.fast_td3.runner.FastTD3Runner[source]

Bases: OffPolicyRunner

FastTD3 runner using the shared OffPolicyRunner training loop.

Parameters:

Methods

__init__(env_name[, env_cfg_override, ...])

close()

learn([max_iterations, save_interval, ...])

Unified training loop for off-policy algorithms.

__init__(env_name, env_cfg_override=None, device=None, num_envs=4096, replay_buffer_n=1000, batch_size=8192, learning_starts=0, num_updates=4, policy_frequency=2, sync_collection=True, env_steps_per_sync=1, gamma=0.97, tau=0.01, actor_lr=0.0003, critic_lr=0.0003, actor_hidden_dim=256, critic_hidden_dim=512, num_atoms=101, v_min=-10.0, v_max=10.0, init_scale=0.01, log_std_min=-0.9, log_std_max=0.0, policy_noise=0.1, noise_clip=0.2, weight_decay=0.001, use_cdq=True, obs_normalization=True, sim_backend='mujoco', seed=None, trace_enabled=False, trace_output_dir=None, trace_thread_time=False, trace_cuda_events=True)[source]
Parameters:
close()
Return type:

None

learn(max_iterations=1500, save_interval=50, log_dir='logs', logger_type='tensorboard')

Unified training loop for off-policy algorithms.

Parameters:
  • max_iterations (int)

  • save_interval (int)

  • log_dir (str)

  • logger_type (str)

Return type:

None