unilab.algos.torch.hora.sac_learner.HoraSACLearner¶
- class unilab.algos.torch.hora.sac_learner.HoraSACLearner[source]¶
Bases:
FastSACLearnerFastSAC learner variant whose actor consumes HORA privileged info.
- Parameters:
Methods
__init__(*, obs_dim, critic_obs_dim, ...[, ...])Save all components.
load_state_dict(state_dict)Load all components.
Polyak-average update of the target Q-network.
update_actor(batch)One actor update step.
update_critic(batch)One critic update step.
- __init__(*, obs_dim, critic_obs_dim, priv_info_dim, action_dim, device='cpu', actor_hidden_dim=512, priv_info_embed_dim=9, priv_mlp_hidden_dims=(256, 128, 9), log_std_max=0.0, log_std_min=-5.0, use_tanh=True, use_layer_norm=True, actor_lr=0.0003, weight_decay=0.001, use_symmetry=False, symmetry_augmentation=None, **kwargs)[source]¶
- update_actor(batch)¶
One actor update step.