unilab.logging.offpolicy

Rich-based training logger for off-policy RL algorithms (SAC, TD3, etc).

Classes

OffPolicyLogger

Rich logger for off-policy RL algorithms (SAC, TD3, etc).

class unilab.logging.offpolicy.OffPolicyLogger[source]

Bases: BaseTrainingLogger

Rich logger for off-policy RL algorithms (SAC, TD3, etc).

Parameters:
__init__(algo_name='RL', max_iterations=1500, num_envs=4096, env_name='', obs_dim=0, action_dim=0, refresh_per_second=4, log_dir='', log_backend='tensorboard', wandb_project='unilab', wandb_entity=None, wandb_name='', wandb_group=None, wandb_job_type=None, wandb_tags=None, wandb_notes=None)[source]
Parameters:
start(*, status='Warming up...')[source]
Parameters:

status (str)

finish(*, title='Training Summary', extra_summary='')[source]
Parameters:
  • title (str)

  • extra_summary (str)

log_buffer_fill(current, target)[source]
Parameters:
update_collector_timing(timing_ms)[source]
Parameters:

timing_ms (dict[str, float])

update_done_rates(timeout_rate, terminated_rate)[source]
Parameters:
update_buffer_utilization(utilization)[source]
Parameters:

utilization (float)

update_replay_queue(current_len, max_size)[source]
Parameters:
  • current_len (int)

  • max_size (int)

update_staging_pool(current_len, max_size)[source]
Parameters:
  • current_len (int)

  • max_size (int)

set_collection_sync(enabled, env_steps_per_sync=0)[source]
Parameters:
  • enabled (bool)

  • env_steps_per_sync (int)

log_collector(total_steps, buffer_size, mean_reward=0.0)[source]
Parameters:
  • total_steps (int)

  • buffer_size (int)

  • mean_reward (float)

log_step(iteration, metrics=None, reward=None, reward_metrics=None, reward_components=None, train_time=0.0, wait_time=0.0, learner_incremental_h2d_time=0.0, weight_sync_time=0.0, extra_info=None)[source]
Parameters:
log_status(status)[source]
Parameters:

status (str)