unilab.algos.torch.offpolicy.double_buffer_runner

Off-policy runner using CPU-pinned double-buffer replay pipeline (B path).

Classes

DoubleBufferOffPolicyRunner

OffPolicyRunner variant that uses CPUPinnedDoubleBufferReplayPipeline.

class unilab.algos.torch.offpolicy.double_buffer_runner.DoubleBufferOffPolicyRunner[source]

Bases: OffPolicyRunner

OffPolicyRunner variant that uses CPUPinnedDoubleBufferReplayPipeline.

The only behavioural difference from the parent class is in learn(): - ReplayBuffer is created as packed CPU shared storage. - Sampling goes through CPUPinnedDoubleBufferReplayPipeline instead of

ReplayBuffer.sample().

Parameters:
  • replay_prefetch_mode (str)

  • verbose_metrics (bool)

LEARNER_LOG_INTERVAL = 10
__init__(*, replay_prefetch_mode='one_tick', verbose_metrics=False, **kwargs)[source]
Parameters:
  • replay_prefetch_mode (str)

  • verbose_metrics (bool)

learn(max_iterations=1500, save_interval=50, log_dir='logs', logger_type='tensorboard')[source]

Unified training loop for off-policy algorithms.

Parameters:
  • max_iterations (int)

  • save_interval (int)

  • log_dir (str)

  • logger_type (str)

Return type:

None