Language

unilab.algos.torch.offpolicy.double_buffer_runner.DoubleBufferOffPolicyRunner¶

class unilab.algos.torch.offpolicy.double_buffer_runner.DoubleBufferOffPolicyRunner[source]¶

Bases: OffPolicyRunner

OffPolicyRunner variant that uses CPUPinnedDoubleBufferReplayPipeline.

The only behavioural difference from the parent class is in learn(): - ReplayBuffer is created as packed CPU shared storage. - Sampling goes through CPUPinnedDoubleBufferReplayPipeline instead of

ReplayBuffer.sample().

Parameters:

replay_prefetch_mode (str)
verbose_metrics (bool)

Methods

`__init__`(*[, replay_prefetch_mode, ...])
`close`()
`learn`([max_iterations, save_interval, ...])	Unified training loop for off-policy algorithms.

Attributes

LEARNER_LOG_INTERVAL

LEARNER_LOG_INTERVAL = 10¶

__init__(*, replay_prefetch_mode='one_tick', verbose_metrics=False, **kwargs)[source]¶

Parameters:

replay_prefetch_mode (str)
verbose_metrics (bool)

replay_transfer_backend: dict[str, object]¶

learn(max_iterations=1500, save_interval=50, log_dir='logs', logger_type='tensorboard')[source]¶

Unified training loop for off-policy algorithms.

Parameters:

max_iterations (int)
save_interval (int)
log_dir (str)
logger_type (str)

Return type:

None

close()¶

Return type:: None