unilab.algos.torch.flash_sac.learner.RewardNormalizer

class unilab.algos.torch.flash_sac.learner.RewardNormalizer[source]

Bases: object

Adaptive reward scaling with running discounted-return statistics.

Parameters:

Methods

__init__(gamma, g_max, device[, eps])

load_state_dict(state_dict)

normalize(rewards)

state_dict()

update_from_transitions(rewards, dones)

__init__(gamma, g_max, device, eps=1e-08)[source]
Parameters:
update_from_transitions(rewards, dones)[source]
Parameters:
Return type:

None

normalize(rewards)[source]
Parameters:

rewards (Tensor)

Return type:

Tensor

state_dict()[source]
Return type:

dict[str, Any]

load_state_dict(state_dict)[source]
Parameters:

state_dict (dict[str, Any])

Return type:

None