unilab.algos.torch.offpolicy.double_buffer_runner.DoubleBufferOffPolicyRunner¶
- class unilab.algos.torch.offpolicy.double_buffer_runner.DoubleBufferOffPolicyRunner[source]¶
Bases:
OffPolicyRunnerOffPolicyRunner variant that uses CPUPinnedDoubleBufferReplayPipeline.
The only behavioural difference from the parent class is in learn(): - ReplayBuffer is created as packed CPU shared storage. - Sampling goes through CPUPinnedDoubleBufferReplayPipeline instead of
ReplayBuffer.sample().
Methods
__init__(*[, replay_prefetch_mode, ...])close()learn([max_iterations, save_interval, ...])Unified training loop for off-policy algorithms.
Attributes
- LEARNER_LOG_INTERVAL = 10¶
- replay_transfer_backend: dict[str, object]¶