unilab.algos.torch.offpolicy.multi_gpu_runner.MultiGPUOffPolicyRunner¶
- class unilab.algos.torch.offpolicy.multi_gpu_runner.MultiGPUOffPolicyRunner[source]¶
Bases:
OffPolicyRunnerMulti-GPU off-policy runner.
Keeps a single Collector on CPU and spawns num_gpus Learner workers via
torch.multiprocessing.spawn. Each worker processes independent mini-batches from the same shared ReplayBuffer; gradients are averaged with NCCL all_reduce — equivalent to training on a num_gpus× larger effective batch size per wall-clock second.Falls back transparently to single-GPU when
num_gpus <= 1.- Parameters:
Methods
__init__(learner, env_name, algo_type, ...)close()learn([max_iterations, save_interval, ...])Unified training loop for off-policy algorithms.
validate_capabilities(*, algo_type, ...)