Language

Domain Randomization¶

This page only describes the current status of tasks in the repo that are already registered and already wired to a DR provider. All conclusions come from the code; nothing is inferred from design intent.

The current unified entry point lives in NpEnv._init_domain_randomization() and DomainRandomizationManager:

init path: the task provider produces an InitRandomizationPlan; the manager calls the backend’s apply_init_randomization(...) during env initialization
reset path: the task provider produces a ResetPlan; the manager validates capability and then calls the backend’s set_state(..., randomization=...)
interval path: the task provider produces an IntervalRandomizationPlan; the manager calls the backend’s apply_interval_randomization(...) as needed before step

These three paths correspond to three lifecycle classes:

init-lifecycle DR: items that change the model identity or model geometry; can only take effect during env/backend initialization and materialization, e.g. Sharpa-hand object geom_size scaling.
reset-lifecycle DR: items that do not change model identity, only change parameters or reset state within the same model, e.g. base_mass_delta, base_com_offset, gravity, kp, kd.
interval-lifecycle DR: external perturbations between steps, e.g. push.

Status Conclusions¶

All tasks currently wired to a DR provider use the unified DR entry point; no task bypasses DomainRandomizationManager to run a separate DR flow inside reset().
They are all roughly structured: task files define a domain_rand config dataclass, a DomainRandomizationProvider, and a ResetPlan; G1WalkFlat reuses G1Walk’s provider.
What is “unified” today is mainly the entry point and execution flow, not every randomization item itself. The shared helper build_common_reset_randomization() currently generates base_mass_delta, base_com_offset, gravity, kp, kd; the shared interval helper currently only generates push.
ResetRandomizationPayload can already express gravity, body_iquat, body_inertia, kp, kd, and MuJoCoBackend has declared support. Whether these are actually used still depends on whether the task provider samples and dispatches them.
MotrixBackend currently supports base_mass_delta, base_com_offset, kp, kd, and interval push; and it requires all model actuators to be position actuators during initialization.
geom_size is not a reset-lifecycle field; Sharpa-hand object geom scale is handled by init-lifecycle model materialization.

Uniformity Assessment Table¶

Task	Uses unified DR entry?	Structured form?	reset form	interval form	Code
`Go1JoystickFlat`	Yes	Yes: `Domain_Rand + Provider + ResetPlan`	task state sampling + common payload	push	`go1/joystick.py`
`Go2JoystickFlat`	Yes	Yes: `Domain_Rand + Provider + ResetPlan`	task state sampling + common payload	push	`go2/joystick.py`
`G1WalkFlat`	Yes	Yes: `Domain_Rand + Provider + ResetPlan`	task state sampling + common payload	push	`g1/joystick.py`
`G1WalkRough`	Yes	Yes: reuses `G1WalkDomainRandomizationProvider`	task state sampling + common payload	push	`g1/joystick.py`
`G1MotionTracking`	Yes	Yes: `Domain_Rand + Provider + ResetPlan`	extensive task-specific reset sampling + common payload	push	`motion_tracking/g1/tracking.py`
`AllegroInhandRotation`	Yes	Yes: `DomainRandConfig + Provider + ResetPlan`	task-specific reset sampling + common payload	none	`allegro_inhand/rotation.py`
`SharpaInhandRotation`	Yes	Yes: `InitRandomizationPlan + ResetPlan + IntervalRandomizationPlan`	grasp cache sampling + common payload	object `body_force`	`sharpa_inhand/rotation.py`
`SharpaInhandRotationGrasp`	Yes	Yes: reuses the Sharpa rotation provider and overrides reset sampling	grasp collection reset + common payload	none	`sharpa_inhand/grasp_gen.py`

Per-task Domain Randomization List¶

Task	Currently implemented reset domain randomization	Currently implemented interval domain randomization	Default state
`Go1JoystickFlat`	base xy; base yaw; base qvel; command sampling; `current_actions/last_actions` zeroed; optional `base_mass_delta`; optional `base_com_offset`; optional `gravity`	`push_robots`	`base_mass_delta`, `base_com_offset`, and push enabled by default; `gravity` disabled by default
`Go2JoystickFlat`	base xy; base yaw; base qvel; command sampling; `current_actions/last_actions` zeroed; kp/kd randomization (enabled by default); optional `base_mass_delta`; optional `base_com_offset`; optional `gravity`	`push_robots`	kp/kd enabled by default; common payload and push disabled by default
`G1WalkFlat`	base xy; base yaw; base qvel sampled by `reset_base_qvel_limit`; command sampling; `gait_phase` sampling; `current_actions/last_actions` zeroed; kp/kd randomization (enabled by default); optional `base_mass_delta`; optional `base_com_offset`; optional `gravity`	`push_robots`	kp/kd enabled by default; common payload and push disabled by default
`G1WalkRough`	Same as `G1WalkFlat`, directly reuses the same provider	`push_robots`	kp/kd enabled by default; common payload and push disabled by default
`G1MotionTracking`	motion frame sampling; root pose perturbation `x/y/z/roll/pitch/yaw`; root velocity perturbation `x/y/z/roll/pitch/yaw`; joint position noise; under MuJoCo clipped by joint range; `current_actions/last_actions` zeroed; optional `base_mass_delta`; optional `base_com_offset`; optional `gravity`	`push_robots`	`pose_randomization`, `velocity_randomization`, `joint_position_range` have non-zero perturbations by default; common payload and push disabled by default
`AllegroInhandRotation`	If a grasp cache exists, sample a grasp randomly; otherwise apply `joint_noise` to hand joints and `ball_z_offset` to the ball; always apply `ball_vel_noise` to ball linear velocity; optional common reset randomization payload (incl. `gravity`)	none	If the grasp cache path is available it is sampled by default; `joint_noise`, `ball_vel_noise`, `ball_z_offset` default to 0; common payload disabled by default
`SharpaInhandRotation`	grasp cache bucketed sampling by `scale_ids`; object pose / quat reset; optional common reset randomization payload (incl. `gravity`)	object `body_force` direct force disturbance	`domain_rand.scale_list` defaults come from the owner YAML; under MuJoCo, object geom scale is materialized during init; common payload disabled by default; object force enabled by default via the Sharpa owner YAML
`SharpaInhandRotationGrasp`	hand pose reset; object pose / quat reset; collects successful grasps and stores them bucketed by `scale_ids`; optional `base_mass_delta`; optional `base_com_offset`; optional `gravity`	none	Used by default to generate the Sharpa grasp cache; cache filename includes the single scale value; common payload disabled by default

Current Unified DR Capabilities and Boundaries¶

1. Unified Entry Point Is Complete¶

The unified entry point is guaranteed by NpEnv and DomainRandomizationManager:

Tasks only need to register a provider
The manager uniformly performs capability validation
The backend is uniformly responsible for actually applying the randomization payload

So from an execution-path perspective, the tasks are already unified.

2. The Shared Helpers Are Still Narrow¶

dr_utils.py currently has only two classes of shared helpers:

reset common payload: base_mass_delta, base_com_offset, gravity, kp, kd
interval common payload: push

This means:

Although locomotion tasks all go through the unified entry point, their base xy, yaw, qvel, command, and gait phase are still sampled directly inside each provider
G1MotionTracking’s pose / velocity / joint noise is also task-specific logic
Allegro’s grasp / object initial state sampling is entirely task-specific logic
Sharpa’s geom_size scale is init-lifecycle model materialization and is not part of the reset common payload

So today’s “uniformity” is more about the contract and the calling convention than “all tasks share the same set of randomization-item schemas”.

3. Backend Capabilities Already Exceed What Tasks Currently Use¶

ResetRandomizationPayload now contains:

base_mass_delta
base_com_offset
gravity
body_iquat
body_inertia
kp
kd

Backend capability today:

MuJoCoBackend: supports the 7 reset terms above, plus interval push and interval body force
MotrixBackend: supports base_mass_delta, base_com_offset, kp, kd, plus interval push; requires actuators to all be position actuators during initialization

Notes:

The current IntervalRandomizationPlan supports push_perturbation_limit, body_linear_velocity_delta, and body_force; among these, body_force expresses hot-path direct external-force perturbations without exposing the backend-private xfrc_applied details.
The current MuJoCo backend’s interval push and interval body force are both dispatched through xfrc_applied; the Sharpa-hand object disturbance has been switched to direct force disturbance.
The Motrix backend currently still does not support direct body-force disturbance, so such owner configs must continue to be explicitly disabled.

But on the task side, the current reality is: not every provider constructs these fields. The backend contract is the capability boundary; whether the task config and provider dispatch a payload is what determines whether a given task actually enables the corresponding DR item.

Reset gravity Usage¶

gravity is a reset-lifecycle DR: on each reset, a full MuJoCo gravity vector (gx, gy, gz) is sampled per env subset and dispatched to the backend via ResetRandomizationPayload.gravity. This vector expresses both direction and magnitude:

Direction: determined by the direction of (gx, gy, gz).
Magnitude: determined by the vector norm sqrt(gx^2 + gy^2 + gz^2).
Lifecycle: only sampled and written at reset; the env retains that gravity until the next reset re-samples it.
Backend: currently in UniLab, only the MuJoCo backend declares support for this reset term; the Motrix backend does not. Some tasks filter it by capability and skip it; others raise an error in the validate stage.

The config entry is under each task’s env.domain_rand:

env:
  domain_rand:
    randomize_gravity: true
    gravity_range:
      - [-0.2, -0.2, -10.5]
      - [0.2, 0.2, -8.5]

Field semantics:

randomize_gravity: whether to enable gravity reset DR; defaults to false.
gravity_range: a (2, 3)-shaped per-dimension sampling range; the first and second rows give the upper and lower bounds of each component.
On each reset, each dimension is uniformly sampled within [min(row0, row1), max(row0, row1)]. The direction is not automatically normalized, and the gravity norm is not fixed.

If you only want to randomize the magnitude while keeping the vertical-down direction, only open up the z component:

uv run train --algo ppo --task g1_walk_flat --sim mujoco \
  env.domain_rand.randomize_gravity=true \
  'env.domain_rand.gravity_range=[[0.0,0.0,-10.5],[0.0,0.0,-8.5]]'

If you want to randomize both direction and magnitude, open up x/y/z:

uv run train --algo ppo --task g1_walk_flat --sim mujoco \
  env.domain_rand.randomize_gravity=true \
  'env.domain_rand.gravity_range=[[-0.3,-0.3,-10.5],[0.3,0.3,-8.5]]'

Notes:

gravity_range must be convertible into a (2, 3) array; otherwise reset will raise an error when constructing the payload.
This term does not call mj_setConst; MuJoCo step / forward reads mjModel.opt.gravity directly.
Do not enable this term under the Motrix backend; the current Motrix capability does not include gravity.
If your current environment still has a mujoco-uni package installed that does not include the gravity field, MuJoCo reset will raise unsupported field; you need to use a mujoco-uni build/release that includes the field.
During training it is recommended to start from a small tilt range; otherwise sampling a too-large horizontal gravity early on may degrade the task into being unlearnable.

Interval push Usage¶

Tasks supporting interval push configure it under env.domain_rand:

env:
  domain_rand:
    push_robots: true
    push_interval: 750
    max_force: [1.0, 1.0, 0.5]
    push_body_name: null

push_robots: whether to enable push.
push_interval: trigger every N env steps.
max_force: a length-3 external-force upper limit; each dimension is sampled within [-max_force, max_force].
push_body_name: the target body / link to apply the force to. Defaults to null, meaning the backend’s base_name is used.

uv run train --algo ppo --task g1_walk_flat --sim mujoco \
  env.domain_rand.push_robots=true \
  env.domain_rand.push_interval=500 \
  'env.domain_rand.max_force=[20.0,20.0,5.0]' \
  env.domain_rand.push_body_name=torso_link

Notes:

MuJoCo resolves by body name, Motrix resolves by link name; a missing name raises an error during env/backend initialization.
push_body_name is an init config; changing it after env creation does not change the already-resolved target.
The hot path only samples and applies the external force; it does not parse XML / asset and does not probe backend-private capability.
MuJoCo push is implemented via xfrc_applied external force and does not directly overwrite base velocity.

`geom_size` Lifecycle Boundary¶

geom_size is explicitly not part of ResetRandomizationPayload, and must not be modified on the hot path via BatchEnvPool.reset(..., randomization=...).

The reason is that geom_size changes model geometry and model identity; the correct lifecycle is:

The task provider generates the model variants and env-to-model assignment in build_init_randomization_plan(...).
The MuJoCo backend modifies geom size on the cold path using MjSpec and compiles scale-specific MjModels.
The backend constructs BatchEnvPool with a model sequence of length num_envs.

The reset stage only performs state and parameter perturbations within the same model identity; it does not handle geom_size.

This boundary exists to honor the cold-path asset/model-metadata access principle: step(), reset(), and hot-path DR do not parse XML, do not read assets, and do not branch at runtime based on asset metadata.

Sharpa-hand Object Geom Scale Usage¶

Sharpa-hand is the current example task for geom_size init-lifecycle DR in the repo. Related task configs:

conf/ppo/task/sharpa_inhand/mujoco.yaml
conf/ppo/task/sharpa_inhand_grasp/mujoco.yaml

1. Config Entry¶

Sharpa’s scale configuration lives in the env owner YAML at env.domain_rand.scale_list:

env:
  object_body_name: object
  object_geom_name: object
  domain_rand:
    scale_list: [0.5, 0.6, 0.7, 0.8]

Field semantics:

object_body_name: the object body name; used to locate the object body during reset / observation, not the target field for scaling.
object_geom_name: the MuJoCo geom name to scale; defaults to object.
domain_rand.scale_list: explicit scale list; each value must be greater than 0.
The order of domain_rand.scale_list is the scale_id order.
The length of domain_rand.scale_list is the number of model variants.

Each env is statically assigned a scale_id. The current assignment rule is contiguous bucket assignment; when algo.num_envs is not divisible by num_scales, the first few scale buckets get one extra env:

uv run train --algo ppo --task sharpa_inhand --sim mujoco 'env.domain_rand.scale_list=[0.5,0.6,0.7,0.8]' algo.num_envs=4096

If algo.num_envs=4096 and num_scales=4, then every 1024 envs use the same scale bucket.

2. MuJoCo Materialization Behavior¶

How the MuJoCo backend applies it:

The env/provider builds ModelVariantSpec based on scale_list during init.
The backend uses MjSpec to read the model and modify the size of the geom corresponding to object_geom_name.
Each scale compiles a scale-specific MjModel.
The first time a physics pool is needed, the env-to-model assignment is expanded into a model sequence of length num_envs, and then BatchEnvPool is constructed.

Therefore, domain_rand.scale_list only takes effect during env/backend initialization. Changing env.domain_rand.scale_list after env creation does not change the already-materialized model pool.

This flow has three important boundaries:

BatchEnvPool is lazily constructed; the normal path does not first construct a pool for the default model and then rebuild it for scale_list.
Compilation of multiple model variants is done in chunks using process-based parallelism; do not compile in a Python thread, and do not serially compile num_envs models in an upper-level for loop.
Workers compile variants with MjSpec and save .mjb; the parent process only loads MjModel.from_binary_path(...) by .mjb path. Do not transmit modified model objects or model bytes back via IPC.

3. Grasp Cache and Scale Buckets¶

The Sharpa rotation task samples from multiple single-scale grasp caches by scale_ids:

The cache filename defaults are jointly determined by grasp_cache_path and a single scale value.
scale_list: [0.5, 0.6, 0.7, 0.8] by default corresponds to caches/sharpa_grasp_linspace_0.5.npy, caches/sharpa_grasp_linspace_0.6.npy, caches/sharpa_grasp_linspace_0.7.npy, caches/sharpa_grasp_linspace_0.8.npy.
At rotation startup all cache files for scale_list are checked; if any is missing, it errors.
Each scale bucket only samples from the cache file of its own scale, avoiding mixing grasp initial states across different object scales.

Default-scale caches are hosted on Hugging Face (unilabsim/unilab-caches) and are downloaded automatically into src/unilab/assets/caches/ on first rotation training, so no manual collection is needed for the standard scale_list.

To collect caches for custom scales not on HF — or to regenerate them locally — run the grasp collection task once per scale. Generated files land under src/unilab/assets/caches/, the same location HF downloads to, so subsequent rotation training auto-resolves them. Regeneration is slow.

The helper script collects each scale sequentially:

./scripts/sharpa_collect_grasps.sh 0.5 0.6 0.7 0.8

_{Equivalent per-scale invocations: uv run train --algo ppo --task sharpa_inhand_grasp --sim mujoco 'env.domain_rand.scale_list=[0.5]' algo.num_envs=4096 (repeat for [0.6], [0.7], [0.8], …).}

Then train rotation with the same scale_list:

uv run train --algo ppo --task sharpa_inhand --sim mujoco 'env.domain_rand.scale_list=[0.5,0.6,0.7,0.8]' algo.num_envs=4096

4. Boundaries and Caveats¶

geom_size is not a reset DR field and must not be written into ResetPlan.randomization.
BatchEnvPool.reset(..., randomization=...) currently does not support geom_size.
geom_size scale is only materialized under the MuJoCo backend; the Motrix backend currently does not produce a multi-model pool from scale_list.
The length of scale_list is the number of model variants, not the number of resamples per reset.
Each env’s scale_id is statically assigned during init and does not change at reset.
When scaling out, scale out the number of model variants; do not compile one model per env according to num_envs. Multiple envs share the same MjModel corresponding to the same scale bucket.
The hot path must not read XML, parse assets, or use getattr / hasattr to probe backend-private capability to decide scaling behavior.
When extending to other shape DR, prefer to reuse the init-lifecycle contract; do not stuff shape fields into the reset payload.

Domain Randomization¶

Status Conclusions¶

Uniformity Assessment Table¶

Per-task Domain Randomization List¶

Current Unified DR Capabilities and Boundaries¶

1. Unified Entry Point Is Complete¶

2. The Shared Helpers Are Still Narrow¶

3. Backend Capabilities Already Exceed What Tasks Currently Use¶

Reset gravity Usage¶

Interval push Usage¶

geom_size Lifecycle Boundary¶

Sharpa-hand Object Geom Scale Usage¶

1. Config Entry¶

2. MuJoCo Materialization Behavior¶

3. Grasp Cache and Scale Buckets¶

4. Boundaries and Caveats¶

Related Tasks¶

`geom_size` Lifecycle Boundary¶