MLX PPO¶
MLX PPO uses the PPO task-owner tree but swaps the training runtime to the MLX
implementation. The entry script is scripts/train_mlx_ppo.py, the config is
conf/ppo/config_mlx.yaml, and the implementation lives under
src/unilab/algos/mlx/ppo/.
Quick Start¶
uv run train --algo mlx_ppo --task go2_joystick_flat --sim mujoco
uv run train --algo mlx_ppo --task go2_joystick_flat --sim motrix training.no_play=true
Notes¶
conf/ppo/config_mlx.yamlsetstraining.device=mlx.The
mlxdependency is enabled by thesys_platform == 'darwin'marker inpyproject.toml.MLX compose coverage is tracked separately in the generated support matrix: 后端支持矩阵.
Use torch PPO first when you need the default training path; use MLX PPO when you are intentionally exercising the MLX runtime.