unilab.algos.torch.offpolicy.runner.build_reward_comparison_metrics
-
unilab.algos.torch.offpolicy.runner.build_reward_comparison_metrics(reward_history, smoothed_reward)[source]
Return the latest collector-side 100-episode mean for reward comparison.
- Parameters:
-
- Return type:
dict[str, float]