unilab.algos.mlx.common.rollout_storage

Rollout buffer for on-policy algorithms.

Classes

RolloutBuffer

On-policy rollout storage for vectorized environments.

class unilab.algos.mlx.common.rollout_storage.RolloutBuffer[source]

Bases: object

On-policy rollout storage for vectorized environments.

Parameters:
num_steps: int
num_envs: int
obs_dim: int
action_dim: int
gamma: float
lam: float
dtype: Any | None = None
add(obs, actions, log_probs, action_mean, action_std, rewards, dones, values)[source]
Parameters:
  • obs (array)

  • actions (array)

  • log_probs (array)

  • action_mean (array)

  • action_std (array)

  • rewards (array)

  • dones (array)

  • values (array)

Return type:

None

compute_returns_and_advantages(last_values)[source]
Parameters:

last_values (array)

Return type:

None

mini_batch_generator(num_mini_batches, num_epochs)[source]
Parameters:
  • num_mini_batches (int)

  • num_epochs (int)

Return type:

Generator[Dict[str, array], None, None]

clear()[source]
Return type:

None

__init__(num_steps, num_envs, obs_dim, action_dim, gamma, lam, dtype=None)
Parameters: