unilab.algos.torch.offpolicy.multi_gpu_runner¶
Multi-GPU off-policy runner using NCCL all-reduce for FastSAC.
- Architecture:
- Main process → creates ReplayBuffer (host-only), WeightSync, queues
→ spawns Collector subprocess (CPU, env simulation) → spawns N Learner workers via mp.spawn (one per GPU)
- Learner rank i → samples packed CPU replay rows to its rank device, then
communicates via NCCL all_reduce
Collector → talks only to rank 0 via collection_ready_queue / trainer_done_queue
Classes
Multi-GPU off-policy runner. |
- class unilab.algos.torch.offpolicy.multi_gpu_runner.MultiGPUOffPolicyRunner[source]¶
Bases:
OffPolicyRunnerMulti-GPU off-policy runner.
Keeps a single Collector on CPU and spawns num_gpus Learner workers via
torch.multiprocessing.spawn. Each worker processes independent mini-batches from the same shared ReplayBuffer; gradients are averaged with NCCL all_reduce — equivalent to training on a num_gpus× larger effective batch size per wall-clock second.Falls back transparently to single-GPU when
num_gpus <= 1.- Parameters: