unilab.envs.locomotion.common.rewards¶
Shared reward functions for locomotion environments.
Introduces RewardContext — a dataclass that bundles all state any
reward function might need. Shared reward functions are plain
module-level callables fn(ctx) -> np.ndarray so that each
joystick environment can reference them directly in its
_reward_fns dispatch table without per-class wrapper methods.
Functions
|
Penalty for change in actions between timesteps. |
|
Constant reward for staying alive. |
|
Penalty for roll/pitch angular velocity. |
|
Penalty for base height deviation from target. |
|
Penalty for joint acceleration magnitude. |
|
Penalty for joint acceleration magnitude (L2). |
|
Penalty for joint torque magnitude (L2). |
|
Penalty for mechanical energy consumption. |
|
Biped foot air-time reward: only rewards single-stance phase. |
|
Reward for forward progress relative to commanded speed. |
|
L1 penalty for joints deviating from their default positions. |
|
Penalty for joint position over/under-shoot relative to backend limits. |
|
Penalty for joint deviation that switches scale based on command/body motion. |
|
Penalty for joint mechanical power (|tau * dq|). |
|
Penalty for vertical (z) linear velocity. |
|
Penalty for deviation from upright orientation (roll/pitch). |
|
Penalty for deviation from roll orientation. |
|
Standard |
|
Penalty for joint position deviation from default (L1 norm). |
|
Penalty for joint deviation from default while command norm is below threshold. |
|
Penalty for total torque magnitude (L1 norm). |
Exponential tracking of yaw angular velocity (world frame). |
|
Exponential tracking of xy linear velocity in the gravity-aligned yaw frame. |
|
|
Exponential reward for tracking commanded yaw angular velocity. |
|
Exponential reward for tracking commanded xy linear velocity. |
|
Penalty for being below commanded forward speed. |
|
Exponential reward for upright orientation. |
|
Scalar gate in [0, 1] from the body-up projection of gravity. |
|
Reward favouring an upright body (no Go2 upright gate). |
|
Weighted L2 penalty for joint position deviation. |
Classes
Immutable snapshot of everything reward functions may read. |
- class unilab.envs.locomotion.common.rewards.RewardContext[source]¶
Bases:
objectImmutable snapshot of everything reward functions may read.
Built once per
_compute_rewardcall. Shared functions access only the fields they need; robot-specific methods that still live on the environment class receive the same context viaself.- Parameters:
- __init__(info, linvel, gyro, dof_pos, num_envs=0, default_angles=<factory>, tracking_sigma=0.25, base_height_target=0.0, base_height=<factory>, gravity=None, dof_vel=None, pose_weights=None, joint_range=None, linvel_yaw=None)¶
- unilab.envs.locomotion.common.rewards.tracking_lin_vel(ctx)[source]¶
Exponential reward for tracking commanded xy linear velocity.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.tracking_ang_vel(ctx)[source]¶
Exponential reward for tracking commanded yaw angular velocity.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.forward_progress(ctx)[source]¶
Reward for forward progress relative to commanded speed.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.under_speed(ctx)[source]¶
Penalty for being below commanded forward speed.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.lin_vel_z(ctx)[source]¶
Penalty for vertical (z) linear velocity.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.ang_vel_xy(ctx)[source]¶
Penalty for roll/pitch angular velocity.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.orientation(ctx)[source]¶
Penalty for deviation from upright orientation (roll/pitch).
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.roll(ctx)[source]¶
Penalty for deviation from roll orientation.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.upright(ctx)[source]¶
Exponential reward for upright orientation.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.base_height(ctx)[source]¶
Penalty for base height deviation from target.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.similar_to_default(ctx)[source]¶
Penalty for joint position deviation from default (L1 norm).
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.weighted_pose(ctx)[source]¶
Weighted L2 penalty for joint position deviation.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.action_rate(ctx)[source]¶
Penalty for change in actions between timesteps.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.torques(ctx)[source]¶
Penalty for total torque magnitude (L1 norm).
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.energy(ctx)[source]¶
Penalty for mechanical energy consumption.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.dof_acc(ctx)[source]¶
Penalty for joint acceleration magnitude.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.alive(ctx)[source]¶
Constant reward for staying alive.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.upright_scale(gravity, num_envs)[source]¶
Scalar gate in [0, 1] from the body-up projection of gravity.
Used by quadruped rough tasks to suppress reward / penalty bookkeeping while the robot is tipping over. Returns 1.0 when the body is upright (gravity[:, 2] >= 0.7) and 0.0 when fully tipped.
- unilab.envs.locomotion.common.rewards.dof_torques_l2(ctx)[source]¶
Penalty for joint torque magnitude (L2).
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.dof_acc_l2(ctx)[source]¶
Penalty for joint acceleration magnitude (L2).
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.joint_pos_limits(ctx)[source]¶
Penalty for joint position over/under-shoot relative to backend limits.
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.joint_power(ctx)[source]¶
Penalty for joint mechanical power (|tau * dq|).
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.stand_still(ctx, command_threshold=0.1)[source]¶
Penalty for joint deviation from default while command norm is below threshold.
- Parameters:
ctx (
RewardContext)command_threshold (
float)
- Return type:
- unilab.envs.locomotion.common.rewards.joint_pos_penalty(ctx, *, stand_still_scale=5.0, velocity_threshold=0.5, command_threshold=0.1)[source]¶
Penalty for joint deviation that switches scale based on command/body motion.
- Parameters:
ctx (
RewardContext)stand_still_scale (
float)velocity_threshold (
float)command_threshold (
float)
- Return type:
- unilab.envs.locomotion.common.rewards.upward(ctx)[source]¶
Reward favouring an upright body (no Go2 upright gate).
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.track_lin_vel_xy_yaw_frame_exp(ctx)[source]¶
Exponential tracking of xy linear velocity in the gravity-aligned yaw frame.
Requires
ctx.linvel_yaw(base linvel rotated into yaw frame).- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.track_ang_vel_z_world_exp(ctx)[source]¶
Exponential tracking of yaw angular velocity (world frame).
- Parameters:
ctx (
RewardContext)- Return type:
- unilab.envs.locomotion.common.rewards.feet_air_time_positive_biped(ctx, *, threshold=0.4, command_threshold=0.1)[source]¶
Biped foot air-time reward: only rewards single-stance phase.
Reads
ctx.infokeyscurrent_air_time,current_contact_time(each shape (N, 2)); the environment populates them per step.- Parameters:
ctx (
RewardContext)threshold (
float)command_threshold (
float)
- Return type:
- unilab.envs.locomotion.common.rewards.joint_deviation_l1(ctx, joint_indices=None)[source]¶
L1 penalty for joints deviating from their default positions.
- Parameters:
ctx (
RewardContext)
- Return type:
- unilab.envs.locomotion.common.rewards.run_reward_dispatch(*, scales, fns, ctx, info, enable_log, ctrl_dt, log_every_n_steps=4, only_positive=False)[source]¶
Standard
scales × fns(ctx)reduction shared by all locomotion envs.Writes per-reward means into
info["log"]whenenable_logand thesteps[0]cadence matcheslog_every_n_steps.Returns
reward * ctrl_dt(with optional positive clamp).