Domain Randomization for Real-World Transfer¶
This page is the deployment checklist for domain randomization. For the contract layer (what a DR provider must implement), see Domain Randomization Contract.
What to randomize, in priority order¶
Category |
Examples |
Why it matters |
|---|---|---|
Actuator dynamics |
PD gains, action scale, one-step action delay when the task owner enables it |
First-order driver of policy oscillation on hardware. |
Mass / inertia |
Trunk mass, link COM offsets, payload |
Affects balance and tracking margins. |
Friction |
Foot ↔ ground μ, hand ↔ object μ |
In-hand cube tasks fail without this. |
Observation noise |
IMU noise, joint encoder bias, deploy-side observation history |
Keeps actor inputs close to deploy-side sensor behavior. |
External forces |
Pushes, gusts, tug on payload |
Robustness to unmodeled disturbances. |
Reset state |
Initial pose, initial velocity |
Reduces brittleness at episode boundary. |
Heuristic
If a parameter materially affects the closed-loop response and you do not have a deploy-side measurement, keep the claim out of docs and encode a conservative range in the task owner only after recording why that range is plausible.
How UniLab structures DR¶
Tasks that use DR attach a provider through the env initialization path:
from unilab.envs.locomotion.common.dr_provider import LocomotionDRProvider
class MyTaskEnv(NpEnv):
def __init__(self, cfg):
super().__init__(cfg)
self._init_domain_randomization(LocomotionDRProvider(cfg.domain_rand))
The manager lives in src/unilab/dr/manager.py; providers live near their env
owners and conform to the contract in
Domain Randomization Contract.
Recipe: starting ranges¶
Use the selected owner YAML as the source of truth. For example,
conf/ppo/task/go2_joystick_rough/mujoco.yaml enables base-mass, COM, kp/kd,
and push randomization; conf/ppo/task/sharpa_inhand/mujoco.yaml configures
PD-gain, friction, COM, mass, joint-noise, and contact-noise fields.
# conf/ppo/task/go2_joystick_rough/mujoco.yaml
env:
domain_rand:
randomize_base_mass: true
added_mass_range: [-1.0, 3.0]
random_com: true
randomize_kp: true
kp_multiplier_range: [0.5, 2.0]
randomize_kd: true
kd_multiplier_range: [0.5, 2.0]
push_robots: true
push_interval: 625
Curriculum: ramp DR with skill¶
DR that’s too aggressive at step 0 stalls learning. UniLab curriculum helpers are task-owned; keep their fields in the selected owner YAML and do not add Python-side interpretation in training scripts.
Validating DR coverage¶
After training, replay the checkpoint against the same backend owner YAML while you sweep DR ranges in config:
uv run eval --algo ppo --task go2_joystick_flat --sim motrix --load-run -1
Log reward components and task success metrics for each sweep point. A sharp drop or a reward-component discontinuity is evidence that the DR range changed the task contract rather than only widening deployment coverage.