Language

Sim-to-Real Overview¶

This page is the map of UniLab’s sim-to-real workflow. Every subsequent page in this section drills into one stage.

What “sim-to-real” means in UniLab¶

A deployable UniLab policy is the exported policy plus the exact observation and action contracts used by the selected task owner. The G1 WBT helper path materializes this as policy.onnx, deploy_config.yaml, and a motion binary; other robots need an equivalent hardware-side runtime that:

Reads sensors → assembles the same observation vector the policy saw in simulation.
Runs policy.onnx through a runtime that supports the exported graph.
Maps the action vector to the same actuator interface used by the env’s SimBackend.

If any of those three things drifts between sim and deployment, debug the contract first before changing reward or hardware tuning.

End-to-end pipeline¶

        flowchart LR
    A[Train in UniLab] --> B[Curriculum + DR]
    B --> C[Validate in alt backend]
    C --> D[Export ONNX]
    D --> E[Latency / lag injection]
    E --> F[Safety layer]
    F --> G[Hardware bringup]
    G --> H[Closed-loop run]
    H -. iterate .-> B

Stage	UniLab artefact	Page
Train	Task owner YAML + training script	CLI Reference
Curriculum + DR	`unilab.dr` + task-side providers	Domain Randomization for Real-World Transfer
Cross-backend sanity	`--task <task> --sim <other_backend>`	Backend Swap
ONNX export	Training playback scripts + deploy helpers	ONNX Runtime
Latency / obs lag	Task config flags and deploy-side logs	Latency Budget
Safety layer	Hardware-side clamp / fallback	Hardware Safety Layers
Robot bringup	Robot-specific guides	G1 Whole-Body Motion Tracking on Hardware, Go2 / Go2W Locomotion Deployment, Allegro / Sharpa In-Hand Manipulation Deployment

What you should have before starting¶

Pre-flight checklist

A converged training run with stable reward AND a stable success criterion (motion tracking error, drop count, etc.).
The same policy passes evaluation in both MuJoCo and Motrix when both support the task — if not, you have a backend-dependent reward leak; see Reward Parity Across Backends.
Domain randomization ranges large enough that reward varies smoothly when you sweep DR strength — a brittle policy in sim is a brittle policy on hardware.
No backend feature leakage in the env — verify via the developer guide’s Backend Capability Contract.
An observation spec you can implement on hardware. If your policy reads body_lin_vel, you need a deploy-side estimator or a task owner variant that removes that signal from the actor input.

The most common failure modes¶

Observation drift. Sensor pre-processing differs between sim and deploy runtime (units, frame, filter cutoffs). Log the first deploy-side observation window and compare it with a sim rollout built from the same owner YAML.
Action latency. Some task configs expose one-step delayed action execution through control_config.simulate_action_latency. Measure the deploy loop and make the training owner match that contract before a hardware run. See Latency Budget.
Friction / damping mismatch. Especially for in-hand manipulation. Sweep friction in DR; cross-check via Aligning Contact and Friction Between Backends.
Reset transients. Sim resets to a stable pose; deployment starts from a controller state. The safety layer must reject malformed observations and unsafe actions before they reach the motor driver.

Per-robot quick links¶

🤖 G1 whole-body

Humanoid motion tracking deployment, joint clamp ranges, IMU alignment.

G1 Whole-Body Motion Tracking on Hardware

🐕 Go2 locomotion

Joystick + rough terrain policies on Go2 and Go2W.

Go2 / Go2W Locomotion Deployment

✋ Allegro in-hand

Dexterous cube reorientation, tactile-free deployment, grasp generator.

Allegro / Sharpa In-Hand Manipulation Deployment