unilab.algos.torch.appo.runner

APPO Runner — Asynchronous PPO with native multiprocessing.

Pipeline:
  1. Collector subprocess publishes rollout payloads → RolloutRingBuffer

  2. Learner reads rollouts, computes V-trace corrected updates

  3. Weights synced back to collector via SharedWeightSync

Classes

APPORunner

APPO async runner using shared memory.

class unilab.algos.torch.appo.runner.APPORunner[source]

Bases: AsyncRunner

APPO async runner using shared memory.

Parameters:
__init__(env_name, env_cfg_overrides, rl_cfg, device=None, collector_device=None, sim_backend='mujoco', num_envs=1024, steps_per_env=24, num_workers=1, replay_queue_size=3, seed=None, resume_path=None)[source]
Parameters:
learn(max_iterations=1500, save_interval=50, log_dir='logs', logger_type='tensorboard')[source]
Parameters:
  • max_iterations (int)

  • save_interval (int)

  • log_dir (str)

  • logger_type (str)

Return type:

None