unilab.algos.torch.him_ppo.runner.HIMOnPolicyRunner¶
- class unilab.algos.torch.him_ppo.runner.HIMOnPolicyRunner[source]¶
Bases:
objectOn-policy training runner for HIM-PPO.
Provides the same external interface as rsl_rl’s
OnPolicyRunnerso thattrain_him_ppo.pycan share most of its logic withtrain_rsl_rl.py.Methods
__init__(env, train_cfg[, log_dir, device])export_policy_to_jit(path[, filename])Export the end-to-end HIM-PPO policy (estimator + actor) via TorchScript trace.
export_policy_to_onnx(path[, filename])Export the end-to-end HIM-PPO policy (estimator + actor) to ONNX.
get_inference_policy([device])learn(num_learning_iterations[, ...])load(path)save(path)- export_policy_to_onnx(path, filename='policy.onnx')[source]¶
Export the end-to-end HIM-PPO policy (estimator + actor) to ONNX.
Input: obs_history (1, H_a * num_one_step_obs) — flattened actor obs history Output: actions (1, num_actions)
- export_policy_to_jit(path, filename='policy.pt')[source]¶
Export the end-to-end HIM-PPO policy (estimator + actor) via TorchScript trace.
Wraps estimator + actor as a plain module (without HIMActorCritic’s properties) so that torch.jit.trace can introspect it without hitting the distribution assert.