unilab.algos.torch.offpolicy.double_buffer_runner¶
Off-policy runner using CPU-pinned double-buffer replay pipeline (B path).
Classes
OffPolicyRunner variant that uses CPUPinnedDoubleBufferReplayPipeline. |
- class unilab.algos.torch.offpolicy.double_buffer_runner.DoubleBufferOffPolicyRunner[source]¶
Bases:
OffPolicyRunnerOffPolicyRunner variant that uses CPUPinnedDoubleBufferReplayPipeline.
The only behavioural difference from the parent class is in learn(): - ReplayBuffer is created as packed CPU shared storage. - Sampling goes through CPUPinnedDoubleBufferReplayPipeline instead of
ReplayBuffer.sample().
- LEARNER_LOG_INTERVAL = 10¶