Language

unilab.algos.torch.him_ppo.runner.HIMOnPolicyRunner¶

class unilab.algos.torch.him_ppo.runner.HIMOnPolicyRunner[source]¶

Bases: object

On-policy training runner for HIM-PPO.

Provides the same external interface as rsl_rl’s OnPolicyRunner so that train_him_ppo.py can share most of its logic with train_rsl_rl.py.

Parameters:

env (Any)
train_cfg (dict[str, Any])
log_dir (str | None)
device (str)

Methods

`__init__`(env, train_cfg[, log_dir, device])
`export_policy_to_jit`(path[, filename])	Export the end-to-end HIM-PPO policy (estimator + actor) via TorchScript trace.
`export_policy_to_onnx`(path[, filename])	Export the end-to-end HIM-PPO policy (estimator + actor) to ONNX.
`get_inference_policy`([device])
`learn`(num_learning_iterations[, ...])
`load`(path)
`save`(path)

__init__(env, train_cfg, log_dir=None, device='cpu')[source]¶

Parameters:

env (Any)
train_cfg (dict[str, Any])
log_dir (str | None)
device (str)

learn(num_learning_iterations, init_at_random_ep_len=True)[source]¶

Parameters:

num_learning_iterations (int)
init_at_random_ep_len (bool)

Return type:

None

save(path)[source]¶

Parameters:: path (str)
Return type:: None

load(path)[source]¶

Parameters:: path (str)
Return type:: None

get_inference_policy(device=None)[source]¶

Parameters:: device (str | None)
Return type:: Callable[..., Any]

export_policy_to_onnx(path, filename='policy.onnx')[source]¶

Export the end-to-end HIM-PPO policy (estimator + actor) to ONNX.

Input: obs_history (1, H_a * num_one_step_obs) — flattened actor obs history Output: actions (1, num_actions)

Parameters:

path (str)
filename (str)

Return type:

None

export_policy_to_jit(path, filename='policy.pt')[source]¶

Export the end-to-end HIM-PPO policy (estimator + actor) via TorchScript trace.

Wraps estimator + actor as a plain module (without HIMActorCritic’s properties) so that torch.jit.trace can introspect it without hitting the distribution assert.

Parameters:

path (str)
filename (str)

Return type:

None