unilab.algos.torch.offpolicy.runner.build_reward_comparison_metrics

unilab.algos.torch.offpolicy.runner.build_reward_comparison_metrics(reward_history, smoothed_reward)[source]

Return the latest collector-side 100-episode mean for reward comparison.

Parameters:
Return type:

dict[str, float]