unilab.algos.torch.hora¶
- class unilab.algos.torch.hora.HoraActorModel[source]¶
Bases:
Module- Parameters:
- is_recurrent: bool = False¶
- __init__(obs, obs_groups, obs_set, output_dim, *, shared_model=None, hidden_dims=(512, 256, 128), activation='elu', obs_normalization=False, distribution_cfg=None, priv_info_dim=None, priv_info_embed_dim=8, priv_mlp_hidden_dims=(256, 128, 8), use_student_encoder=False, proprio_hist_len=30, proprio_frame_dim=None)[source]¶
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- forward(obs, masks=None, hidden_state=None, stochastic_output=False)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- property distribution: rsl_rl.modules.GaussianDistribution¶
- class unilab.algos.torch.hora.HoraCriticModel[source]¶
Bases:
Module- Parameters:
- is_recurrent: bool = False¶
- __init__(obs, obs_groups, obs_set, output_dim, *, shared_model=None, hidden_dims=(512, 256, 128), activation='elu', obs_normalization=False, priv_info_dim=None, priv_info_embed_dim=8, priv_mlp_hidden_dims=(256, 128, 8))[source]¶
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- forward(obs, masks=None, hidden_state=None, stochastic_output=False)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class unilab.algos.torch.hora.HoraDistillationTrainer[source]¶
Bases:
objectStage-2 HORA latent distillation trainer.
- Parameters:
- class unilab.algos.torch.hora.HoraPPO[source]¶
Bases:
FinalObservationAwarePPOPPO variant that constructs a shared HORA actor-critic backbone.
- Parameters:
actor (
HoraActorModel)critic (
HoraCriticModel)storage (
RolloutStorage)num_learning_epochs (
int)num_mini_batches (
int)clip_param (
float)gamma (
float)lam (
float)value_loss_coef (
float)entropy_coef (
float)learning_rate (
float)max_grad_norm (
float)optimizer (
str)use_clipped_value_loss (
bool)schedule (
str)desired_kl (
float)normalize_advantage_per_mini_batch (
bool)device (
str)enable_compile (
bool)
- __init__(actor, critic, storage, num_learning_epochs=5, num_mini_batches=4, clip_param=0.2, gamma=0.99, lam=0.95, value_loss_coef=1.0, entropy_coef=0.01, learning_rate=0.001, max_grad_norm=1.0, optimizer='adam', use_clipped_value_loss=True, schedule='adaptive', desired_kl=0.01, normalize_advantage_per_mini_batch=False, device='cpu', rnd_cfg=None, symmetry_cfg=None, multi_gpu_cfg=None, enable_compile=False)[source]¶
- Parameters:
actor (
HoraActorModel)critic (
HoraCriticModel)storage (
RolloutStorage)num_learning_epochs (
int)num_mini_batches (
int)clip_param (
float)gamma (
float)lam (
float)value_loss_coef (
float)entropy_coef (
float)learning_rate (
float)max_grad_norm (
float)optimizer (
str)use_clipped_value_loss (
bool)schedule (
str)desired_kl (
float)normalize_advantage_per_mini_batch (
bool)device (
str)enable_compile (
bool)
Bases:
ModuleShared-backbone HORA actor-critic with optional adaptation encoder.
- Parameters:
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- Parameters:
obs (
TensorDict)- Return type:
- Parameters:
obs (
TensorDict)prefer_student (
bool)
- Return type:
Tensor-only HORA trunk path used by APPO compiled minibatch loss.
- Parameters:
obs (
TensorDict)prefer_student (
bool)
- Return type:
- Parameters:
obs (
TensorDict)prefer_student (
bool)
- Return type:
Modules
HORA-owned APPO entry helpers. |
|
HORA-owned APPO learner with grouped actor and privileged observations. |
|
HORA-owned APPO runner. |
|
HORA-owned APPO rollout worker. |
|
HORA distillation config and teacher-owner resolution helpers. |
|
HORA-owned observation helpers for teacher-policy runtime code. |
|
HORA-owned RSL-RL wrapper helpers for teacher-policy runtime. |
|
Compatibility helpers for HORA's supported RSL-RL config schemas. |
|
Config-driven runtime selection helpers for HORA teacher-policy RL. |
|
HORA-owned SAC entry helpers. |
|
HORA-owned FastSAC learner. |
|
HORA SAC actor models. |