unilab.algos.torch.offpolicy.runner¶
Unified runner for off-policy RL algorithms (SAC, TD3).
Functions
Return the latest collector-side 100-episode mean for reward comparison. |
|
|
Return the minimum replay size required before learner updates may start. |
Whether the replay buffer has enough samples for the first learner step. |
Classes
Unified runner for SAC and TD3. |
- unilab.algos.torch.offpolicy.runner.compute_train_start_threshold(batch_size, learning_starts, num_envs)[source]¶
Return the minimum replay size required before learner updates may start.
- unilab.algos.torch.offpolicy.runner.replay_buffer_ready_for_learning(replay_buffer_size, *, batch_size, learning_starts, num_envs)[source]¶
Whether the replay buffer has enough samples for the first learner step.
- unilab.algos.torch.offpolicy.runner.build_reward_comparison_metrics(reward_history, smoothed_reward)[source]¶
Return the latest collector-side 100-episode mean for reward comparison.
- class unilab.algos.torch.offpolicy.runner.OffPolicyRunner[source]¶
Bases:
AsyncRunnerUnified runner for SAC and TD3.
- Parameters:
env_name (
str)algo_type (
str)num_envs (
int)replay_buffer_n (
int)batch_size (
int)learning_starts (
int)updates_per_step (
int)policy_frequency (
int)sync_collection (
bool)env_steps_per_sync (
int)actor_hidden_dim (
int)use_layer_norm (
bool)obs_normalization (
bool)sim_backend (
str)trace_enabled (
bool)trace_thread_time (
bool)trace_cuda_events (
bool)
- __init__(learner, env_name, algo_type, num_envs=4096, replay_buffer_n=1024, batch_size=8192, learning_starts=0, updates_per_step=8, policy_frequency=4, sync_collection=True, env_steps_per_sync=1, device=None, actor_hidden_dim=512, use_layer_norm=True, obs_normalization=False, sim_backend='mujoco', env_cfg_override=None, actor_kwargs=None, seed=None, trace_enabled=False, trace_output_dir=None, trace_thread_time=False, trace_cuda_events=True)[source]¶
- Parameters:
env_name (
str)algo_type (
str)num_envs (
int)replay_buffer_n (
int)batch_size (
int)learning_starts (
int)updates_per_step (
int)policy_frequency (
int)sync_collection (
bool)env_steps_per_sync (
int)actor_hidden_dim (
int)use_layer_norm (
bool)obs_normalization (
bool)sim_backend (
str)trace_enabled (
bool)trace_thread_time (
bool)trace_cuda_events (
bool)