Language

unilab.envs.locomotion.common.rewards¶

Shared reward functions for locomotion environments.

Introduces RewardContext — a dataclass that bundles all state any reward function might need. Shared reward functions are plain module-level callables fn(ctx) -> np.ndarray so that each joystick environment can reference them directly in its _reward_fns dispatch table without per-class wrapper methods.

Functions

`action_rate`(ctx)	Penalty for change in actions between timesteps.
`alive`(ctx)	Constant reward for staying alive.
`ang_vel_xy`(ctx)	Penalty for roll/pitch angular velocity.
`base_height`(ctx)	Penalty for base height deviation from target.
`dof_acc`(ctx)	Penalty for joint acceleration magnitude.
`dof_acc_l2`(ctx)	Penalty for joint acceleration magnitude (L2).
`dof_torques_l2`(ctx)	Penalty for joint torque magnitude (L2).
`energy`(ctx)	Penalty for mechanical energy consumption.
`feet_air_time_positive_biped`(ctx, *[, ...])	Biped foot air-time reward: only rewards single-stance phase.
`forward_progress`(ctx)	Reward for forward progress relative to commanded speed.
`joint_deviation_l1`(ctx[, joint_indices])	L1 penalty for joints deviating from their default positions.
`joint_pos_limits`(ctx)	Penalty for joint position over/under-shoot relative to backend limits.
`joint_pos_penalty`(ctx, *[, ...])	Penalty for joint deviation that switches scale based on command/body motion.
`joint_power`(ctx)	Penalty for joint mechanical power (\|tau * dq\|).
`lin_vel_z`(ctx)	Penalty for vertical (z) linear velocity.
`orientation`(ctx)	Penalty for deviation from upright orientation (roll/pitch).
`roll`(ctx)	Penalty for deviation from roll orientation.
`run_reward_dispatch`(*, scales, fns, ctx, ...)	Standard `scales × fns(ctx)` reduction shared by all locomotion envs.
`similar_to_default`(ctx)	Penalty for joint position deviation from default (L1 norm).
`stand_still`(ctx[, command_threshold])	Penalty for joint deviation from default while command norm is below threshold.
`torques`(ctx)	Penalty for total torque magnitude (L1 norm).
`track_ang_vel_z_world_exp`(ctx)	Exponential tracking of yaw angular velocity (world frame).
`track_lin_vel_xy_yaw_frame_exp`(ctx)	Exponential tracking of xy linear velocity in the gravity-aligned yaw frame.
`tracking_ang_vel`(ctx)	Exponential reward for tracking commanded yaw angular velocity.
`tracking_lin_vel`(ctx)	Exponential reward for tracking commanded xy linear velocity.
`under_speed`(ctx)	Penalty for being below commanded forward speed.
`upright`(ctx)	Exponential reward for upright orientation.
`upright_scale`(gravity, num_envs)	Scalar gate in [0, 1] from the body-up projection of gravity.
`upward`(ctx)	Reward favouring an upright body (no Go2 upright gate).
`weighted_pose`(ctx)	Weighted L2 penalty for joint position deviation.

Classes

RewardContext

Immutable snapshot of everything reward functions may read.

class unilab.envs.locomotion.common.rewards.RewardContext[source]¶

Bases: object

Immutable snapshot of everything reward functions may read.

Built once per _compute_reward call. Shared functions access only the fields they need; robot-specific methods that still live on the environment class receive the same context via self.

Parameters:

info (dict)
linvel (ndarray)
gyro (ndarray)
dof_pos (ndarray)
num_envs (int)
default_angles (ndarray)
tracking_sigma (float)
base_height_target (float)
base_height (ndarray)
gravity (ndarray | None)
dof_vel (ndarray | None)
pose_weights (ndarray | None)
joint_range (ndarray | None)
linvel_yaw (ndarray | None)

info: dict¶

linvel: ndarray¶

gyro: ndarray¶

dof_pos: ndarray¶

num_envs: int = 0¶

default_angles: ndarray¶

tracking_sigma: float = 0.25¶

base_height_target: float = 0.0¶

base_height: ndarray¶

gravity: ndarray | None = None¶

dof_vel: ndarray | None = None¶

pose_weights: ndarray | None = None¶

joint_range: ndarray | None = None¶

linvel_yaw: ndarray | None = None¶

__init__(info, linvel, gyro, dof_pos, num_envs=0, default_angles=<factory>, tracking_sigma=0.25, base_height_target=0.0, base_height=<factory>, gravity=None, dof_vel=None, pose_weights=None, joint_range=None, linvel_yaw=None)¶

Parameters:

info (dict)
linvel (ndarray)
gyro (ndarray)
dof_pos (ndarray)
num_envs (int)
default_angles (ndarray)
tracking_sigma (float)
base_height_target (float)
base_height (ndarray)
gravity (ndarray | None)
dof_vel (ndarray | None)
pose_weights (ndarray | None)
joint_range (ndarray | None)
linvel_yaw (ndarray | None)

unilab.envs.locomotion.common.rewards.tracking_lin_vel(ctx)[source]¶

Exponential reward for tracking commanded xy linear velocity.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.tracking_ang_vel(ctx)[source]¶

Exponential reward for tracking commanded yaw angular velocity.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.forward_progress(ctx)[source]¶

Reward for forward progress relative to commanded speed.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.under_speed(ctx)[source]¶

Penalty for being below commanded forward speed.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.lin_vel_z(ctx)[source]¶

Penalty for vertical (z) linear velocity.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.ang_vel_xy(ctx)[source]¶

Penalty for roll/pitch angular velocity.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.orientation(ctx)[source]¶

Penalty for deviation from upright orientation (roll/pitch).

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.roll(ctx)[source]¶

Penalty for deviation from roll orientation.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.upright(ctx)[source]¶

Exponential reward for upright orientation.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.base_height(ctx)[source]¶

Penalty for base height deviation from target.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.similar_to_default(ctx)[source]¶

Penalty for joint position deviation from default (L1 norm).

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.weighted_pose(ctx)[source]¶

Weighted L2 penalty for joint position deviation.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.action_rate(ctx)[source]¶

Penalty for change in actions between timesteps.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.torques(ctx)[source]¶

Penalty for total torque magnitude (L1 norm).

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.energy(ctx)[source]¶

Penalty for mechanical energy consumption.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.dof_acc(ctx)[source]¶

Penalty for joint acceleration magnitude.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.alive(ctx)[source]¶

Constant reward for staying alive.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.upright_scale(gravity, num_envs)[source]¶

Scalar gate in [0, 1] from the body-up projection of gravity.

Used by quadruped rough tasks to suppress reward / penalty bookkeeping while the robot is tipping over. Returns 1.0 when the body is upright (gravity[:, 2] >= 0.7) and 0.0 when fully tipped.

Parameters:

gravity (ndarray | None)
num_envs (int)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.dof_torques_l2(ctx)[source]¶

Penalty for joint torque magnitude (L2).

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.dof_acc_l2(ctx)[source]¶

Penalty for joint acceleration magnitude (L2).

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.joint_pos_limits(ctx)[source]¶

Penalty for joint position over/under-shoot relative to backend limits.

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.joint_power(ctx)[source]¶

Penalty for joint mechanical power (|tau * dq|).

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.stand_still(ctx, command_threshold=0.1)[source]¶

Penalty for joint deviation from default while command norm is below threshold.

Parameters:

ctx (RewardContext)
command_threshold (float)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.joint_pos_penalty(ctx, *, stand_still_scale=5.0, velocity_threshold=0.5, command_threshold=0.1)[source]¶

Penalty for joint deviation that switches scale based on command/body motion.

Parameters:

ctx (RewardContext)
stand_still_scale (float)
velocity_threshold (float)
command_threshold (float)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.upward(ctx)[source]¶

Reward favouring an upright body (no Go2 upright gate).

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.track_lin_vel_xy_yaw_frame_exp(ctx)[source]¶

Exponential tracking of xy linear velocity in the gravity-aligned yaw frame.

Requires ctx.linvel_yaw (base linvel rotated into yaw frame).

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.track_ang_vel_z_world_exp(ctx)[source]¶

Exponential tracking of yaw angular velocity (world frame).

Parameters:: ctx (RewardContext)
Return type:: ndarray

unilab.envs.locomotion.common.rewards.feet_air_time_positive_biped(ctx, *, threshold=0.4, command_threshold=0.1)[source]¶

Biped foot air-time reward: only rewards single-stance phase.

Reads ctx.info keys current_air_time, current_contact_time (each shape (N, 2)); the environment populates them per step.

Parameters:

ctx (RewardContext)
threshold (float)
command_threshold (float)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.joint_deviation_l1(ctx, joint_indices=None)[source]¶

L1 penalty for joints deviating from their default positions.

Parameters:

ctx (RewardContext)
joint_indices (ndarray | None)

Return type:

ndarray

unilab.envs.locomotion.common.rewards.run_reward_dispatch(*, scales, fns, ctx, info, enable_log, ctrl_dt, log_every_n_steps=4, only_positive=False)[source]¶

Standard scales × fns(ctx) reduction shared by all locomotion envs.

Writes per-reward means into info["log"] when enable_log and the steps[0] cadence matches log_every_n_steps.
Returns reward * ctrl_dt (with optional positive clamp).

Parameters:

scales (Mapping[str, float])
fns (Mapping[str, Callable[[RewardContext], ndarray]])
ctx (RewardContext)
info (dict[str, Any])
enable_log (bool)
ctrl_dt (float)
log_every_n_steps (int)
only_positive (bool)

Return type:

ndarray