unilab.logging.offpolicy
Rich-based training logger for off-policy RL algorithms (SAC, TD3, etc).
Classes
-
class unilab.logging.offpolicy.OffPolicyLogger[source]
Bases: BaseTrainingLogger
Rich logger for off-policy RL algorithms (SAC, TD3, etc).
- Parameters:
-
-
__init__(algo_name='RL', max_iterations=1500, num_envs=4096, env_name='', obs_dim=0, action_dim=0, refresh_per_second=4, log_dir='', log_backend='tensorboard', wandb_project='unilab', wandb_entity=None, wandb_name='', wandb_group=None, wandb_job_type=None, wandb_tags=None, wandb_notes=None)[source]
- Parameters:
-
-
start(*, status='Warming up...')[source]
- Parameters:
status (str)
-
finish(*, title='Training Summary', extra_summary='')[source]
- Parameters:
title (str)
extra_summary (str)
-
log_buffer_fill(current, target)[source]
- Parameters:
-
-
update_collector_timing(timing_ms)[source]
- Parameters:
timing_ms (dict[str, float])
-
update_done_rates(timeout_rate, terminated_rate)[source]
- Parameters:
-
-
update_buffer_utilization(utilization)[source]
- Parameters:
utilization (float)
-
update_replay_queue(current_len, max_size)[source]
- Parameters:
current_len (int)
max_size (int)
-
update_staging_pool(current_len, max_size)[source]
- Parameters:
current_len (int)
max_size (int)
-
set_collection_sync(enabled, env_steps_per_sync=0)[source]
- Parameters:
enabled (bool)
env_steps_per_sync (int)
-
log_collector(total_steps, buffer_size, mean_reward=0.0)[source]
- Parameters:
total_steps (int)
buffer_size (int)
mean_reward (float)
-
log_step(iteration, metrics=None, reward=None, reward_metrics=None, reward_components=None, train_time=0.0, wait_time=0.0, learner_incremental_h2d_time=0.0, weight_sync_time=0.0, extra_info=None)[source]
- Parameters:
-
-
log_status(status)[source]
- Parameters:
status (str)