unilab.algos.mlx.common¶
MLX RL base modules.
This package contains framework-level building blocks that are reused by algorithm implementations (e.g. PPO).
- class unilab.algos.mlx.common.MLP[source]¶
Bases:
ModuleSimple feed-forward MLP with configurable activations.
- Parameters:
- class unilab.algos.mlx.common.EmpiricalNormalization[source]¶
Bases:
objectNormalize features using running mean/std over batch axis.
- class unilab.algos.mlx.common.EmpiricalDiscountedVariationNormalization[source]¶
Bases:
objectReward normalization with running std of discounted returns.
- class unilab.algos.mlx.common.RolloutBuffer[source]¶
Bases:
objectOn-policy rollout storage for vectorized environments.
- Parameters:
- add(obs, actions, log_probs, action_mean, action_std, rewards, dones, values)[source]¶
- Parameters:
obs (
array)actions (
array)log_probs (
array)action_mean (
array)action_std (
array)rewards (
array)dones (
array)values (
array)
- Return type:
- unilab.algos.mlx.common.diag_gaussian_log_prob(actions, mean, log_std)[source]¶
Log-probability under a diagonal Gaussian.
- Parameters:
actions (
array)mean (
array)log_std (
array)
- Return type:
array
- unilab.algos.mlx.common.diag_gaussian_entropy(log_std)[source]¶
Entropy of a diagonal Gaussian.
- Parameters:
log_std (
array)- Return type:
array
Modules
Activation helpers for MLX models. |
|
Distribution utilities for RL policies. |
|
MLP module used by MLX RL algorithms. |
|
Running-stat normalization utilities for MLX RL. |
|
Rollout buffer for on-policy algorithms. |
|