unilab.ipc.replay_buffer.ReplayBuffer

class unilab.ipc.replay_buffer.ReplayBuffer[source]

Bases: SharedBufferBase

Shared replay buffer backed by authoritative packed CPU storage.

Device transfer is owned by replay pipeline transfer backends. The fallback sample() path copies a sampled packed batch to self.device and keeps no per-device replay cache.

Parameters:
  • capacity (int)

  • obs_dim (int)

  • action_dim (int)

  • device (str)

  • defer_gpu (bool)

  • critic_dim (int)

  • packed_cpu_storage (bool)

Methods

__init__(capacity, obs_dim, action_dim, device)

add(obs, actions, rewards, next_obs, dones, ...)

Add batch (called by collector).

sample(batch_size)

Sample batch (called by learner).

__init__(capacity, obs_dim, action_dim, device, defer_gpu=False, critic_dim=0, packed_cpu_storage=False)[source]
Parameters:
  • capacity (int)

  • obs_dim (int)

  • action_dim (int)

  • device (str)

  • defer_gpu (bool)

  • critic_dim (int)

  • packed_cpu_storage (bool)

add(obs, actions, rewards, next_obs, dones, truncated, terminal_mask=None, terminal_next_obs=None, critic=None, next_critic=None, terminal_next_critic=None)[source]

Add batch (called by collector).

dones follows the UniLab env lifecycle contract: done = terminated | truncated. Learners must pair it with truncated when computing bootstrap masks.

sample(batch_size)[source]

Sample batch (called by learner).

Parameters:

batch_size (int)

Return type:

Dict[str, Tensor]