Sim-to-Real Overview

This page is the map of UniLab’s sim-to-real workflow. Every subsequent page in this section drills into one stage.

What “sim-to-real” means in UniLab

A deployable UniLab policy is the exported policy plus the exact observation and action contracts used by the selected task owner. The G1 WBT helper path materializes this as policy.onnx, deploy_config.yaml, and a motion binary; other robots need an equivalent hardware-side runtime that:

  1. Reads sensors → assembles the same observation vector the policy saw in simulation.

  2. Runs policy.onnx through a runtime that supports the exported graph.

  3. Maps the action vector to the same actuator interface used by the env’s SimBackend.

If any of those three things drifts between sim and deployment, debug the contract first before changing reward or hardware tuning.

End-to-end pipeline

        flowchart LR
    A[Train in UniLab] --> B[Curriculum + DR]
    B --> C[Validate in alt backend]
    C --> D[Export ONNX]
    D --> E[Latency / lag injection]
    E --> F[Safety layer]
    F --> G[Hardware bringup]
    G --> H[Closed-loop run]
    H -. iterate .-> B
    

Stage

UniLab artefact

Page

Train

Task owner YAML + training script

CLI Reference

Curriculum + DR

unilab.dr + task-side providers

Domain Randomization for Real-World Transfer

Cross-backend sanity

--task <task> --sim <other_backend>

Backend Swap

ONNX export

Training playback scripts + deploy helpers

ONNX Runtime

Latency / obs lag

Task config flags and deploy-side logs

Latency Budget

Safety layer

Hardware-side clamp / fallback

Hardware Safety Layers

Robot bringup

Robot-specific guides

G1 Whole-Body Motion Tracking on Hardware, Go2 / Go2W Locomotion Deployment, Allegro / Sharpa In-Hand Manipulation Deployment

What you should have before starting

Pre-flight checklist

  1. A converged training run with stable reward AND a stable success criterion (motion tracking error, drop count, etc.).

  2. The same policy passes evaluation in both MuJoCo and Motrix when both support the task — if not, you have a backend-dependent reward leak; see Reward Parity Across Backends.

  3. Domain randomization ranges large enough that reward varies smoothly when you sweep DR strength — a brittle policy in sim is a brittle policy on hardware.

  4. No backend feature leakage in the env — verify via the developer guide’s Backend Capability Contract.

  5. An observation spec you can implement on hardware. If your policy reads body_lin_vel, you need a deploy-side estimator or a task owner variant that removes that signal from the actor input.

The most common failure modes

  • Observation drift. Sensor pre-processing differs between sim and deploy runtime (units, frame, filter cutoffs). Log the first deploy-side observation window and compare it with a sim rollout built from the same owner YAML.

  • Action latency. Some task configs expose one-step delayed action execution through control_config.simulate_action_latency. Measure the deploy loop and make the training owner match that contract before a hardware run. See Latency Budget.

  • Friction / damping mismatch. Especially for in-hand manipulation. Sweep friction in DR; cross-check via Aligning Contact and Friction Between Backends.

  • Reset transients. Sim resets to a stable pose; deployment starts from a controller state. The safety layer must reject malformed observations and unsafe actions before they reach the motor driver.