unilab.envs.locomotion.common.rewards

Shared reward functions for locomotion environments.

Introduces RewardContext — a dataclass that bundles all state any reward function might need. Shared reward functions are plain module-level callables fn(ctx) -> np.ndarray so that each joystick environment can reference them directly in its _reward_fns dispatch table without per-class wrapper methods.

Functions

action_rate(ctx)

Penalty for change in actions between timesteps.

alive(ctx)

Constant reward for staying alive.

ang_vel_xy(ctx)

Penalty for roll/pitch angular velocity.

base_height(ctx)

Penalty for base height deviation from target.

dof_acc(ctx)

Penalty for joint acceleration magnitude.

dof_acc_l2(ctx)

Penalty for joint acceleration magnitude (L2).

dof_torques_l2(ctx)

Penalty for joint torque magnitude (L2).

energy(ctx)

Penalty for mechanical energy consumption.

feet_air_time_positive_biped(ctx, *[, ...])

Biped foot air-time reward: only rewards single-stance phase.

forward_progress(ctx)

Reward for forward progress relative to commanded speed.

joint_deviation_l1(ctx[, joint_indices])

L1 penalty for joints deviating from their default positions.

joint_pos_limits(ctx)

Penalty for joint position over/under-shoot relative to backend limits.

joint_pos_penalty(ctx, *[, ...])

Penalty for joint deviation that switches scale based on command/body motion.

joint_power(ctx)

Penalty for joint mechanical power (|tau * dq|).

lin_vel_z(ctx)

Penalty for vertical (z) linear velocity.

orientation(ctx)

Penalty for deviation from upright orientation (roll/pitch).

roll(ctx)

Penalty for deviation from roll orientation.

run_reward_dispatch(*, scales, fns, ctx, ...)

Standard scales × fns(ctx) reduction shared by all locomotion envs.

similar_to_default(ctx)

Penalty for joint position deviation from default (L1 norm).

stand_still(ctx[, command_threshold])

Penalty for joint deviation from default while command norm is below threshold.

torques(ctx)

Penalty for total torque magnitude (L1 norm).

track_ang_vel_z_world_exp(ctx)

Exponential tracking of yaw angular velocity (world frame).

track_lin_vel_xy_yaw_frame_exp(ctx)

Exponential tracking of xy linear velocity in the gravity-aligned yaw frame.

tracking_ang_vel(ctx)

Exponential reward for tracking commanded yaw angular velocity.

tracking_lin_vel(ctx)

Exponential reward for tracking commanded xy linear velocity.

under_speed(ctx)

Penalty for being below commanded forward speed.

upright(ctx)

Exponential reward for upright orientation.

upright_scale(gravity, num_envs)

Scalar gate in [0, 1] from the body-up projection of gravity.

upward(ctx)

Reward favouring an upright body (no Go2 upright gate).

weighted_pose(ctx)

Weighted L2 penalty for joint position deviation.

Classes

RewardContext

Immutable snapshot of everything reward functions may read.

class unilab.envs.locomotion.common.rewards.RewardContext[source]

Bases: object

Immutable snapshot of everything reward functions may read.

Built once per _compute_reward call. Shared functions access only the fields they need; robot-specific methods that still live on the environment class receive the same context via self.

Parameters:
info: dict
linvel: ndarray
gyro: ndarray
dof_pos: ndarray
num_envs: int = 0
default_angles: ndarray
tracking_sigma: float = 0.25
base_height_target: float = 0.0
base_height: ndarray
gravity: ndarray | None = None
dof_vel: ndarray | None = None
pose_weights: ndarray | None = None
joint_range: ndarray | None = None
linvel_yaw: ndarray | None = None
__init__(info, linvel, gyro, dof_pos, num_envs=0, default_angles=<factory>, tracking_sigma=0.25, base_height_target=0.0, base_height=<factory>, gravity=None, dof_vel=None, pose_weights=None, joint_range=None, linvel_yaw=None)
Parameters:
unilab.envs.locomotion.common.rewards.tracking_lin_vel(ctx)[source]

Exponential reward for tracking commanded xy linear velocity.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.tracking_ang_vel(ctx)[source]

Exponential reward for tracking commanded yaw angular velocity.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.forward_progress(ctx)[source]

Reward for forward progress relative to commanded speed.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.under_speed(ctx)[source]

Penalty for being below commanded forward speed.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.lin_vel_z(ctx)[source]

Penalty for vertical (z) linear velocity.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.ang_vel_xy(ctx)[source]

Penalty for roll/pitch angular velocity.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.orientation(ctx)[source]

Penalty for deviation from upright orientation (roll/pitch).

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.roll(ctx)[source]

Penalty for deviation from roll orientation.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.upright(ctx)[source]

Exponential reward for upright orientation.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.base_height(ctx)[source]

Penalty for base height deviation from target.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.similar_to_default(ctx)[source]

Penalty for joint position deviation from default (L1 norm).

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.weighted_pose(ctx)[source]

Weighted L2 penalty for joint position deviation.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.action_rate(ctx)[source]

Penalty for change in actions between timesteps.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.torques(ctx)[source]

Penalty for total torque magnitude (L1 norm).

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.energy(ctx)[source]

Penalty for mechanical energy consumption.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.dof_acc(ctx)[source]

Penalty for joint acceleration magnitude.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.alive(ctx)[source]

Constant reward for staying alive.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.upright_scale(gravity, num_envs)[source]

Scalar gate in [0, 1] from the body-up projection of gravity.

Used by quadruped rough tasks to suppress reward / penalty bookkeeping while the robot is tipping over. Returns 1.0 when the body is upright (gravity[:, 2] >= 0.7) and 0.0 when fully tipped.

Parameters:
Return type:

ndarray

unilab.envs.locomotion.common.rewards.dof_torques_l2(ctx)[source]

Penalty for joint torque magnitude (L2).

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.dof_acc_l2(ctx)[source]

Penalty for joint acceleration magnitude (L2).

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.joint_pos_limits(ctx)[source]

Penalty for joint position over/under-shoot relative to backend limits.

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.joint_power(ctx)[source]

Penalty for joint mechanical power (|tau * dq|).

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.stand_still(ctx, command_threshold=0.1)[source]

Penalty for joint deviation from default while command norm is below threshold.

Parameters:
Return type:

ndarray

unilab.envs.locomotion.common.rewards.joint_pos_penalty(ctx, *, stand_still_scale=5.0, velocity_threshold=0.5, command_threshold=0.1)[source]

Penalty for joint deviation that switches scale based on command/body motion.

Parameters:
Return type:

ndarray

unilab.envs.locomotion.common.rewards.upward(ctx)[source]

Reward favouring an upright body (no Go2 upright gate).

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.track_lin_vel_xy_yaw_frame_exp(ctx)[source]

Exponential tracking of xy linear velocity in the gravity-aligned yaw frame.

Requires ctx.linvel_yaw (base linvel rotated into yaw frame).

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.track_ang_vel_z_world_exp(ctx)[source]

Exponential tracking of yaw angular velocity (world frame).

Parameters:

ctx (RewardContext)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.feet_air_time_positive_biped(ctx, *, threshold=0.4, command_threshold=0.1)[source]

Biped foot air-time reward: only rewards single-stance phase.

Reads ctx.info keys current_air_time, current_contact_time (each shape (N, 2)); the environment populates them per step.

Parameters:
Return type:

ndarray

unilab.envs.locomotion.common.rewards.joint_deviation_l1(ctx, joint_indices=None)[source]

L1 penalty for joints deviating from their default positions.

Parameters:
Return type:

ndarray

unilab.envs.locomotion.common.rewards.run_reward_dispatch(*, scales, fns, ctx, info, enable_log, ctrl_dt, log_every_n_steps=4, only_positive=False)[source]

Standard scales × fns(ctx) reduction shared by all locomotion envs.

  • Writes per-reward means into info["log"] when enable_log and the steps[0] cadence matches log_every_n_steps.

  • Returns reward * ctrl_dt (with optional positive clamp).

Parameters:
Return type:

ndarray