unilab.algos.torch.fast_sac.learner.FastSACLearner

class unilab.algos.torch.fast_sac.learner.FastSACLearner[source]

Bases: object

FastSAC learner with holosoma-aligned hyperparameters.

Key hyperparameters (aligned with holosoma FastSACConfig): - gamma=0.97, tau=0.125 - batch_size=8192, num_updates=8, policy_frequency=4 - alpha_init=0.001, target_entropy_ratio=0.0 - AdamW with betas=(0.9, 0.95), weight_decay=0.001 - Distributional critic (C51, num_atoms=101)

Parameters:

Methods

__init__(obs_dim, action_dim, critic_obs_dim)

get_state_dict()

Save all components.

load_state_dict(state_dict)

Load all components.

soft_update_target()

Polyak-average update of the target Q-network.

update_actor(batch)

One actor update step.

update_critic(batch)

One critic update step.

__init__(obs_dim, action_dim, critic_obs_dim, device='cpu', gamma=0.97, tau=0.125, actor_lr=0.0003, critic_lr=0.0003, alpha_lr=0.0003, alpha_init=0.001, target_entropy_ratio=0.0, actor_hidden_dim=512, critic_hidden_dim=768, num_atoms=101, v_min=-20.0, v_max=20.0, num_q_networks=2, use_layer_norm=True, use_tanh=True, log_std_max=0.0, log_std_min=-5.0, weight_decay=0.001, max_grad_norm=0.0, use_autotune=True, use_symmetry=False, use_amp=False, amp_dtype='auto', use_compile=False, symmetry_augmentation=None, world_size=1)[source]
Parameters:
update_critic(batch)[source]

One critic update step.

Parameters:

batch (Dict[str, Tensor])

Return type:

Dict[str, float]

update_actor(batch)[source]

One actor update step.

Parameters:

batch (Dict[str, Tensor])

Return type:

Dict[str, float]

soft_update_target()[source]

Polyak-average update of the target Q-network.

Return type:

None

get_state_dict()[source]

Save all components.

Return type:

Dict[str, Any]

load_state_dict(state_dict)[source]

Load all components.

Parameters:

state_dict (Dict)

Return type:

None