This page covers the checked-in Allegro and Sharpa in-hand manipulation paths.
Select backends with --task and --sim; do not override
training.sim_backend alone. The owner YAMLs remain the internal evidence for
which combinations are configured.
Allegro rotation uses the registered env AllegroInhandRotation. The rotation
owner is allegro_inhand, and grasp-cache generation uses
allegro_inhand_grasp.
Owner evidence:
conf/ppo/task/allegro_inhand/mujoco.yaml
conf/ppo/task/allegro_inhand/motrix.yaml
conf/ppo/task/allegro_inhand_grasp/mujoco.yaml
conf/ppo/task/allegro_inhand_grasp/motrix.yaml
conf/appo/task/allegro_inhand/mujoco.yaml
conf/appo/task/allegro_inhand/motrix.yaml
The typical flow is two stages: first generate a grasp cache, then train the
rotation policy.
Sharpa rotation uses the registered env SharpaInhandRotation. Current checked
in training paths are MuJoCo owner paths.
Owner evidence:
conf/ppo/task/sharpa_inhand/mujoco.yaml
conf/ppo/task/sharpa_inhand/mujoco_hora.yaml
conf/ppo/task/sharpa_inhand_grasp/mujoco.yaml
conf/appo/task/sharpa_inhand/mujoco.yaml
conf/appo/task/sharpa_inhand/mujoco_hora.yaml
conf/hora_distill/task/sharpa_inhand/mujoco.yaml
The full HORA path is three stages:
Generate the grasp cache.
Train the teacher policy.
Distill a student policy when needed.
The full HORA teacher/student path is MuJoCo-owner-primary. The Motrix path
currently covers only phase-1 PPO rotation and grasp-cache collection; it is not
a full HORA capability-equivalent path.
The default caches are hosted on Hugging Face (unilabsim/unilab-caches) and are
downloaded automatically into src/unilab/assets/caches/ on first training, so
no manual step is needed.
To collect caches for custom scales not on HF — or to regenerate them locally —
run the grasp task once per scale (cache files are named <prefix>_<scale>.npy).
Generated files land under src/unilab/assets/caches/, the same location HF
downloads to, so subsequent training auto-resolves them without further
configuration. Regeneration is slow.
The helper script collects each scale sequentially:
bashscripts/sharpa_collect_grasps.sh0.81.01.2
Equivalent per-scale invocations: uvruntrain--algoppo--tasksharpa_inhand_grasp--simmujoco'env.domain_rand.scale_list=[0.8]'training.no_play=true (repeat for [1.0], [1.2], …).
Motrix can also collect a grasp cache (phase-1 scope only):
Student distillation is configured by
conf/hora_distill/task/sharpa_inhand/mujoco.yaml and implemented by
scripts/train_hora_distill.py; the top-level CLI does not currently expose a
separate HORA distillation route (it is not in the CLI SUPPORTED_ALGOS). To
distill from an APPO teacher, set teacher.algo_family=appo in that low-level
config.
Common log directories:
logs/hora_ppo/SharpaInhandRotation/
logs/hora_appo/SharpaInhandRotation/
logs/hora_distill/SharpaInhandRotation/
The scale / grasp-cache / DR boundary is sensitive here; see
Domain Randomization for the lifecycle rules.
For the category-level task page, see Manipulation.