Language

unilab.algos.mlx.ppo.runner.MLXPPOAgent¶

class unilab.algos.mlx.ppo.runner.MLXPPOAgent[source]¶

High-level PPO wrapper to keep train script lightweight.

Parameters:

Methods

`__init__`(cfg, obs_dim, action_dim, learning_rate)
`act`(obs)
`current_action_std`(action_shape)
`load_trainer_state`(trainer_state_path)
`load_weights`(path)
`mean_noise_std`()
`normalize_rewards`(rewards)
`policy_mean`(obs)
`save_checkpoint`(model_path, ...)
`update`(buffer, last_obs)
`update_normalization`(obs)

Attributes

__init__(cfg, obs_dim, action_dim, learning_rate)[source]¶

Parameters:

update_normalization(obs)[source]¶

policy_mean(obs)[source]¶

normalize_rewards(rewards)[source]¶

current_action_std(action_shape)[source]¶

mean_noise_std()[source]¶

update(buffer, last_obs)[source]¶

Parameters:

load_weights(path)[source]¶

save_checkpoint(model_path, trainer_state_path, iteration)[source]¶

Parameters:

Return type:

None

load_trainer_state(trainer_state_path)[source]¶