unilab.algos.torch.him_ppo.runner

Classes

HIMOnPolicyRunner

On-policy training runner for HIM-PPO.

class unilab.algos.torch.him_ppo.runner.HIMOnPolicyRunner[source]

Bases: object

On-policy training runner for HIM-PPO.

Provides the same external interface as rsl_rl’s OnPolicyRunner so that train_him_ppo.py can share most of its logic with train_rsl_rl.py.

Parameters:
__init__(env, train_cfg, log_dir=None, device='cpu')[source]
Parameters:
learn(num_learning_iterations, init_at_random_ep_len=True)[source]
Parameters:
  • num_learning_iterations (int)

  • init_at_random_ep_len (bool)

Return type:

None

save(path)[source]
Parameters:

path (str)

Return type:

None

load(path)[source]
Parameters:

path (str)

Return type:

None

get_inference_policy(device=None)[source]
Parameters:

device (str | None)

Return type:

Callable[..., Any]

export_policy_to_onnx(path, filename='policy.onnx')[source]

Export the end-to-end HIM-PPO policy (estimator + actor) to ONNX.

Input: obs_history (1, H_a * num_one_step_obs) — flattened actor obs history Output: actions (1, num_actions)

Parameters:
Return type:

None

export_policy_to_jit(path, filename='policy.pt')[source]

Export the end-to-end HIM-PPO policy (estimator + actor) via TorchScript trace.

Wraps estimator + actor as a plain module (without HIMActorCritic’s properties) so that torch.jit.trace can introspect it without hitting the distribution assert.

Parameters:
Return type:

None