unilab.training.reward

Utility functions for reward config handling.

Functions

extract_reward_config(cfg)

Extract and validate reward config from Hydra config.

resolve_reward_dict(cfg)

Resolve the reward config from the final composed config.

unilab.training.reward.resolve_reward_dict(cfg)[source]

Resolve the reward config from the final composed config.

Parameters:

cfg (DictConfig)

Return type:

dict[str, Any]

unilab.training.reward.extract_reward_config(cfg)[source]

Extract and validate reward config from Hydra config.

Parameters:

cfg (DictConfig) – Hydra DictConfig containing reward section

Return type:

dict[str, dict[str, Any]]

Returns:

Dictionary with reward_config key for env_cfg_override

Raises:

ValueError – If reward config is missing