unilab.algos.torch.offpolicy.multi_gpu_runner.MultiGPUOffPolicyRunner

class unilab.algos.torch.offpolicy.multi_gpu_runner.MultiGPUOffPolicyRunner[source]

Bases: OffPolicyRunner

Multi-GPU off-policy runner.

Keeps a single Collector on CPU and spawns num_gpus Learner workers via torch.multiprocessing.spawn. Each worker processes independent mini-batches from the same shared ReplayBuffer; gradients are averaged with NCCL all_reduce — equivalent to training on a num_gpus× larger effective batch size per wall-clock second.

Falls back transparently to single-GPU when num_gpus <= 1.

Parameters:

Methods

__init__(learner, env_name, algo_type, ...)

close()

learn([max_iterations, save_interval, ...])

Unified training loop for off-policy algorithms.

validate_capabilities(*, algo_type, ...)

static validate_capabilities(*, algo_type, learner_kwargs, num_gpus)[source]
Parameters:
Return type:

None

__init__(learner, env_name, algo_type, learner_kwargs, num_gpus=1, **kwargs)[source]
Parameters:
learn(max_iterations=1500, save_interval=50, log_dir='logs', logger_type='tensorboard')[source]

Unified training loop for off-policy algorithms.

Parameters:
  • max_iterations (int)

  • save_interval (int)

  • log_dir (str)

  • logger_type (str)

Return type:

None

close()
Return type:

None