Language

unilab.algos.torch.offpolicy.double_buffer_runner¶

Off-policy runner using CPU-pinned double-buffer replay pipeline (B path).

Classes

DoubleBufferOffPolicyRunner

OffPolicyRunner variant that uses CPUPinnedDoubleBufferReplayPipeline.

class unilab.algos.torch.offpolicy.double_buffer_runner.DoubleBufferOffPolicyRunner[source]¶

Bases: OffPolicyRunner

OffPolicyRunner variant that uses CPUPinnedDoubleBufferReplayPipeline.

The only behavioural difference from the parent class is in learn(): - ReplayBuffer is created as packed CPU shared storage. - Sampling goes through CPUPinnedDoubleBufferReplayPipeline instead of

ReplayBuffer.sample().

Parameters:

replay_prefetch_mode (str)
verbose_metrics (bool)

LEARNER_LOG_INTERVAL = 10¶

__init__(*, replay_prefetch_mode='one_tick', verbose_metrics=False, **kwargs)[source]¶

Parameters:

replay_prefetch_mode (str)
verbose_metrics (bool)

learn(max_iterations=1500, save_interval=50, log_dir='logs', logger_type='tensorboard')[source]¶

Unified training loop for off-policy algorithms.

Parameters:

max_iterations (int)
save_interval (int)
log_dir (str)
logger_type (str)

Return type:

None