unilab.algos.torch.hora.appo_learner

HORA-owned APPO learner with grouped actor and privileged observations.

Classes

HoraAPPOLearner

APPO learner variant for HORA grouped observations.

class unilab.algos.torch.hora.appo_learner.HoraAPPOLearner[source]

Bases: APPOLearner

APPO learner variant for HORA grouped observations.

Parameters:
  • actor (MLPModel)

  • critic (MLPModel)

  • num_learning_epochs (int)

  • num_mini_batches (int)

  • clip_param (float)

  • gamma (float)

  • lam (float)

  • value_loss_coef (float)

  • entropy_coef (float)

  • learning_rate (float)

  • max_grad_norm (float)

  • use_clipped_value_loss (bool)

  • schedule (str)

  • desired_kl (float)

  • adaptive_kl_factor (float)

  • adaptive_lr_factor (float)

  • device (str)

  • optimizer (str)

  • tau (float)

  • target_update_freq (int)

  • vtrace_clip_rho (float)

  • vtrace_clip_c (float)

  • enable_compile (bool)

process_batch(batch_dict)[source]

Compute V-trace targets for grouped HORA rollouts.