Language

unilab.algos.torch.him_ppo.runner¶

Classes

HIMOnPolicyRunner

On-policy training runner for HIM-PPO.

class unilab.algos.torch.him_ppo.runner.HIMOnPolicyRunner[source]¶

Bases: object

On-policy training runner for HIM-PPO.

Provides the same external interface as rsl_rl’s OnPolicyRunner so that train_him_ppo.py can share most of its logic with train_rsl_rl.py.

Parameters:

env (Any)
train_cfg (dict[str, Any])
log_dir (str | None)
device (str)

__init__(env, train_cfg, log_dir=None, device='cpu')[source]¶

Parameters:

env (Any)
train_cfg (dict[str, Any])
log_dir (str | None)
device (str)

learn(num_learning_iterations, init_at_random_ep_len=True)[source]¶

Parameters:

num_learning_iterations (int)
init_at_random_ep_len (bool)

Return type:

None

save(path)[source]¶

Parameters:: path (str)
Return type:: None

load(path)[source]¶

Parameters:: path (str)
Return type:: None

get_inference_policy(device=None)[source]¶

Parameters:: device (str | None)
Return type:: Callable[..., Any]

export_policy_to_onnx(path, filename='policy.onnx')[source]¶

Export the end-to-end HIM-PPO policy (estimator + actor) to ONNX.

Input: obs_history (1, H_a * num_one_step_obs) — flattened actor obs history Output: actions (1, num_actions)

Parameters:

path (str)
filename (str)

Return type:

None

export_policy_to_jit(path, filename='policy.pt')[source]¶

Export the end-to-end HIM-PPO policy (estimator + actor) via TorchScript trace.

Wraps estimator + actor as a plain module (without HIMActorCritic’s properties) so that torch.jit.trace can introspect it without hitting the distribution assert.

Parameters:

path (str)
filename (str)

Return type:

None