Sim-to-Real Overview¶
This page is the map of UniLab’s sim-to-real workflow. Every subsequent page in this section drills into one stage.
What “sim-to-real” means in UniLab¶
A deployable UniLab policy is the exported policy plus the exact observation
and action contracts used by the selected task owner. The G1 WBT helper path
materializes this as policy.onnx, deploy_config.yaml, and a motion binary;
other robots need an equivalent hardware-side runtime that:
Reads sensors → assembles the same observation vector the policy saw in simulation.
Runs
policy.onnxthrough a runtime that supports the exported graph.Maps the action vector to the same actuator interface used by the env’s
SimBackend.
If any of those three things drifts between sim and deployment, debug the contract first before changing reward or hardware tuning.
End-to-end pipeline¶
flowchart LR
A[Train in UniLab] --> B[Curriculum + DR]
B --> C[Validate in alt backend]
C --> D[Export ONNX]
D --> E[Latency / lag injection]
E --> F[Safety layer]
F --> G[Hardware bringup]
G --> H[Closed-loop run]
H -. iterate .-> B
Stage |
UniLab artefact |
Page |
|---|---|---|
Train |
Task owner YAML + training script |
|
Curriculum + DR |
|
|
Cross-backend sanity |
|
|
ONNX export |
Training playback scripts + deploy helpers |
|
Latency / obs lag |
Task config flags and deploy-side logs |
|
Safety layer |
Hardware-side clamp / fallback |
|
Robot bringup |
Robot-specific guides |
G1 Whole-Body Motion Tracking on Hardware, Go2 / Go2W Locomotion Deployment, Allegro / Sharpa In-Hand Manipulation Deployment |
What you should have before starting¶
Pre-flight checklist
A converged training run with stable reward AND a stable success criterion (motion tracking error, drop count, etc.).
The same policy passes evaluation in both MuJoCo and Motrix when both support the task — if not, you have a backend-dependent reward leak; see Reward Parity Across Backends.
Domain randomization ranges large enough that reward varies smoothly when you sweep DR strength — a brittle policy in sim is a brittle policy on hardware.
No backend feature leakage in the env — verify via the developer guide’s Backend Capability Contract.
An observation spec you can implement on hardware. If your policy reads
body_lin_vel, you need a deploy-side estimator or a task owner variant that removes that signal from the actor input.
The most common failure modes¶
Observation drift. Sensor pre-processing differs between sim and deploy runtime (units, frame, filter cutoffs). Log the first deploy-side observation window and compare it with a sim rollout built from the same owner YAML.
Action latency. Some task configs expose one-step delayed action execution through
control_config.simulate_action_latency. Measure the deploy loop and make the training owner match that contract before a hardware run. See Latency Budget.Friction / damping mismatch. Especially for in-hand manipulation. Sweep friction in DR; cross-check via Aligning Contact and Friction Between Backends.
Reset transients. Sim resets to a stable pose; deployment starts from a controller state. The safety layer must reject malformed observations and unsafe actions before they reach the motor driver.
Per-robot quick links¶
Humanoid motion tracking deployment, joint clamp ranges, IMU alignment.
Joystick + rough terrain policies on Go2 and Go2W.
Dexterous cube reorientation, tactile-free deployment, grasp generator.