Top-level modules¶
Small leaf modules that don’t fit elsewhere.
unilab.cli¶
Thin package CLI for routing to existing UniLab training entrypoints.
- class unilab.cli.Route[source]¶
Bases:
objectRoute(script_name: ‘str’, config_group: ‘str’, owner_task: ‘str’, generated_overrides: ‘tuple[str, …]’)
- Parameters:
- unilab.cli.build_command(*, mode, algo, task, sim, overrides, profile=None, load_run=None, render_mode=None, root=None)[source]¶
unilab.demo¶
Demo entrypoint: fetch checkpoint from HF, then launch interactive playback.
- class unilab.demo.DemoSpec[source]¶
Bases:
objectDemoSpec(algo: ‘str’, task: ‘str’, sim: ‘str’, entry: ‘str’)
unilab.structured_configs¶
Typed dataclass configs for all training algorithms.
Replaces ml_collections.ConfigDict factory functions. Use OmegaConf / Hydra to compose these at runtime.
- class unilab.structured_configs.SACAlgoParams[source]¶
Bases:
objectSACAlgoParams(alpha_lr: ‘float’ = 0.0003, alpha_init: ‘float’ = 0.01, target_entropy_ratio: ‘float’ = 0.0, max_grad_norm: ‘float’ = 0.0, amp_dtype: ‘str’ = ‘auto’, use_compile: ‘bool’ = True)
- Parameters:
- class unilab.structured_configs.SACConfig[source]¶
Bases:
BaseConfigSACConfig(algo: ‘str’ = ‘sac’, algo_log_name: ‘str’ = ‘fast_sac’, runtime_impl: ‘Optional[str]’ = None, runtime_resolver: ‘Optional[str]’ = None, seed: ‘int’ = 1, num_envs: ‘int’ = 4096, batch_size: ‘int’ = 8192, replay_buffer_n: ‘int’ = 512, updates_per_step: ‘int’ = 4, learning_starts: ‘int’ = 1, policy_frequency: ‘int’ = 4, env_steps_per_sync: ‘int’ = 1, max_iterations: ‘int’ = 500, save_interval: ‘int’ = 500, gamma: ‘float’ = 0.97, tau: ‘float’ = 0.125, actor_lr: ‘float’ = 0.0003, critic_lr: ‘float’ = 0.0003, actor_hidden_dim: ‘int’ = 512, critic_hidden_dim: ‘int’ = 768, num_atoms: ‘int’ = 101, obs_normalization: ‘bool’ = True, use_layer_norm: ‘bool’ = True, use_symmetry: ‘bool’ = False, actor: ‘dict[str, Any]’ = <factory>, algo_params: ‘SACAlgoParams’ = <factory>)
- Parameters:
algo (
str)algo_log_name (
str)seed (
int)num_envs (
int)batch_size (
int)replay_buffer_n (
int)updates_per_step (
int)learning_starts (
int)policy_frequency (
int)env_steps_per_sync (
int)max_iterations (
int)save_interval (
int)gamma (
float)tau (
float)actor_lr (
float)critic_lr (
float)actor_hidden_dim (
int)critic_hidden_dim (
int)num_atoms (
int)obs_normalization (
bool)use_layer_norm (
bool)use_symmetry (
bool)algo_params (
SACAlgoParams)
-
algo_params:
SACAlgoParams¶
- __init__(algo='sac', algo_log_name='fast_sac', runtime_impl=None, runtime_resolver=None, seed=1, num_envs=4096, batch_size=8192, replay_buffer_n=512, updates_per_step=4, learning_starts=1, policy_frequency=4, env_steps_per_sync=1, max_iterations=500, save_interval=500, gamma=0.97, tau=0.125, actor_lr=0.0003, critic_lr=0.0003, actor_hidden_dim=512, critic_hidden_dim=768, num_atoms=101, obs_normalization=True, use_layer_norm=True, use_symmetry=False, actor=<factory>, algo_params=<factory>)¶
- Parameters:
algo (
str)algo_log_name (
str)seed (
int)num_envs (
int)batch_size (
int)replay_buffer_n (
int)updates_per_step (
int)learning_starts (
int)policy_frequency (
int)env_steps_per_sync (
int)max_iterations (
int)save_interval (
int)gamma (
float)tau (
float)actor_lr (
float)critic_lr (
float)actor_hidden_dim (
int)critic_hidden_dim (
int)num_atoms (
int)obs_normalization (
bool)use_layer_norm (
bool)use_symmetry (
bool)algo_params (
SACAlgoParams)
- class unilab.structured_configs.TD3AlgoParams[source]¶
Bases:
objectTD3AlgoParams(weight_decay: ‘float’ = 0.1, v_min: ‘float’ = -10.0, v_max: ‘float’ = 10.0, init_scale: ‘float’ = 0.01, log_std_min: ‘float’ = -0.9, log_std_max: ‘float’ = 0.0, policy_noise: ‘float’ = 0.2, noise_clip: ‘float’ = 0.5, use_cdq: ‘bool’ = True)
- Parameters:
- __init__(weight_decay=0.1, v_min=-10.0, v_max=10.0, init_scale=0.01, log_std_min=-0.9, log_std_max=0.0, policy_noise=0.2, noise_clip=0.5, use_cdq=True)¶
- class unilab.structured_configs.TD3Config[source]¶
Bases:
BaseConfigTD3Config(algo: ‘str’ = ‘td3’, algo_log_name: ‘str’ = ‘fast_td3’, seed: ‘int’ = 1, num_envs: ‘int’ = 4096, batch_size: ‘int’ = 8192, replay_buffer_n: ‘int’ = 1000, updates_per_step: ‘int’ = 4, learning_starts: ‘int’ = 1, policy_frequency: ‘int’ = 2, env_steps_per_sync: ‘int’ = 1, max_iterations: ‘int’ = 5000, save_interval: ‘int’ = 500, gamma: ‘float’ = 0.97, tau: ‘float’ = 0.1, actor_lr: ‘float’ = 0.0003, critic_lr: ‘float’ = 0.0003, actor_hidden_dim: ‘int’ = 256, critic_hidden_dim: ‘int’ = 512, num_atoms: ‘int’ = 101, obs_normalization: ‘bool’ = True, use_layer_norm: ‘bool’ = False, algo_params: ‘TD3AlgoParams’ = <factory>)
- Parameters:
algo (
str)algo_log_name (
str)seed (
int)num_envs (
int)batch_size (
int)replay_buffer_n (
int)updates_per_step (
int)learning_starts (
int)policy_frequency (
int)env_steps_per_sync (
int)max_iterations (
int)save_interval (
int)gamma (
float)tau (
float)actor_lr (
float)critic_lr (
float)actor_hidden_dim (
int)critic_hidden_dim (
int)num_atoms (
int)obs_normalization (
bool)use_layer_norm (
bool)algo_params (
TD3AlgoParams)
-
algo_params:
TD3AlgoParams¶
- __init__(algo='td3', algo_log_name='fast_td3', seed=1, num_envs=4096, batch_size=8192, replay_buffer_n=1000, updates_per_step=4, learning_starts=1, policy_frequency=2, env_steps_per_sync=1, max_iterations=5000, save_interval=500, gamma=0.97, tau=0.1, actor_lr=0.0003, critic_lr=0.0003, actor_hidden_dim=256, critic_hidden_dim=512, num_atoms=101, obs_normalization=True, use_layer_norm=False, algo_params=<factory>)¶
- Parameters:
algo (
str)algo_log_name (
str)seed (
int)num_envs (
int)batch_size (
int)replay_buffer_n (
int)updates_per_step (
int)learning_starts (
int)policy_frequency (
int)env_steps_per_sync (
int)max_iterations (
int)save_interval (
int)gamma (
float)tau (
float)actor_lr (
float)critic_lr (
float)actor_hidden_dim (
int)critic_hidden_dim (
int)num_atoms (
int)obs_normalization (
bool)use_layer_norm (
bool)algo_params (
TD3AlgoParams)
- class unilab.structured_configs.FlashSACAlgoParams[source]¶
Bases:
objectFlashSACAlgoParams(normalize_reward: ‘bool’ = True, normalized_g_max: ‘float’ = 5.0, actor_num_blocks: ‘int’ = 2, critic_num_blocks: ‘int’ = 2, actor_bc_alpha: ‘float’ = 0.0, actor_noise_zeta_mu: ‘float’ = 2.0, actor_noise_zeta_max: ‘int’ = 16, critic_min_v: ‘float’ = -5.0, critic_max_v: ‘float’ = 5.0, temp_initial_value: ‘float’ = 0.01, temp_target_sigma: ‘float’ = 0.15, temp_target_entropy: ‘float | None’ = None, learning_rate_init: ‘float’ = 0.0003, learning_rate_peak: ‘float’ = 0.0003, learning_rate_end: ‘float’ = 0.00015, learning_rate_warmup_steps: ‘int’ = 0, learning_rate_decay_steps: ‘int’ = 500000, n_step: ‘int’ = 1, amp_dtype: ‘str’ = ‘auto’, use_compile: ‘bool’ = True)
- Parameters:
normalize_reward (
bool)normalized_g_max (
float)actor_num_blocks (
int)critic_num_blocks (
int)actor_bc_alpha (
float)actor_noise_zeta_mu (
float)actor_noise_zeta_max (
int)critic_min_v (
float)critic_max_v (
float)temp_initial_value (
float)temp_target_sigma (
float)learning_rate_init (
float)learning_rate_peak (
float)learning_rate_end (
float)learning_rate_warmup_steps (
int)learning_rate_decay_steps (
int)n_step (
int)amp_dtype (
str)use_compile (
bool)
- __init__(normalize_reward=True, normalized_g_max=5.0, actor_num_blocks=2, critic_num_blocks=2, actor_bc_alpha=0.0, actor_noise_zeta_mu=2.0, actor_noise_zeta_max=16, critic_min_v=-5.0, critic_max_v=5.0, temp_initial_value=0.01, temp_target_sigma=0.15, temp_target_entropy=None, learning_rate_init=0.0003, learning_rate_peak=0.0003, learning_rate_end=0.00015, learning_rate_warmup_steps=0, learning_rate_decay_steps=500000, n_step=1, amp_dtype='auto', use_compile=True)¶
- Parameters:
normalize_reward (
bool)normalized_g_max (
float)actor_num_blocks (
int)critic_num_blocks (
int)actor_bc_alpha (
float)actor_noise_zeta_mu (
float)actor_noise_zeta_max (
int)critic_min_v (
float)critic_max_v (
float)temp_initial_value (
float)temp_target_sigma (
float)learning_rate_init (
float)learning_rate_peak (
float)learning_rate_end (
float)learning_rate_warmup_steps (
int)learning_rate_decay_steps (
int)n_step (
int)amp_dtype (
str)use_compile (
bool)
- class unilab.structured_configs.FlashSACConfig[source]¶
Bases:
BaseConfigFlashSACConfig(algo: ‘str’ = ‘flashsac’, algo_log_name: ‘str’ = ‘flash_sac’, seed: ‘int’ = 1, num_envs: ‘int’ = 1024, batch_size: ‘int’ = 2048, replay_buffer_n: ‘int’ = 512, updates_per_step: ‘int’ = 2, learning_starts: ‘int’ = 98, policy_frequency: ‘int’ = 2, env_steps_per_sync: ‘int’ = 1, max_iterations: ‘int’ = 5000, save_interval: ‘int’ = 1000, gamma: ‘float’ = 0.97, tau: ‘float’ = 0.01, actor_lr: ‘float’ = 0.0003, critic_lr: ‘float’ = 0.0003, actor_hidden_dim: ‘int’ = 128, critic_hidden_dim: ‘int’ = 256, num_atoms: ‘int’ = 101, obs_normalization: ‘bool’ = False, use_layer_norm: ‘bool’ = False, algo_params: ‘FlashSACAlgoParams’ = <factory>)
- Parameters:
algo (
str)algo_log_name (
str)seed (
int)num_envs (
int)batch_size (
int)replay_buffer_n (
int)updates_per_step (
int)learning_starts (
int)policy_frequency (
int)env_steps_per_sync (
int)max_iterations (
int)save_interval (
int)gamma (
float)tau (
float)actor_lr (
float)critic_lr (
float)actor_hidden_dim (
int)critic_hidden_dim (
int)num_atoms (
int)obs_normalization (
bool)use_layer_norm (
bool)algo_params (
FlashSACAlgoParams)
-
algo_params:
FlashSACAlgoParams¶
- __init__(algo='flashsac', algo_log_name='flash_sac', seed=1, num_envs=1024, batch_size=2048, replay_buffer_n=512, updates_per_step=2, learning_starts=98, policy_frequency=2, env_steps_per_sync=1, max_iterations=5000, save_interval=1000, gamma=0.97, tau=0.01, actor_lr=0.0003, critic_lr=0.0003, actor_hidden_dim=128, critic_hidden_dim=256, num_atoms=101, obs_normalization=False, use_layer_norm=False, algo_params=<factory>)¶
- Parameters:
algo (
str)algo_log_name (
str)seed (
int)num_envs (
int)batch_size (
int)replay_buffer_n (
int)updates_per_step (
int)learning_starts (
int)policy_frequency (
int)env_steps_per_sync (
int)max_iterations (
int)save_interval (
int)gamma (
float)tau (
float)actor_lr (
float)critic_lr (
float)actor_hidden_dim (
int)critic_hidden_dim (
int)num_atoms (
int)obs_normalization (
bool)use_layer_norm (
bool)algo_params (
FlashSACAlgoParams)
- class unilab.structured_configs.APPOAlgorithmConfig[source]¶
Bases:
objectAPPOAlgorithmConfig(num_learning_epochs: ‘int’ = 5, num_mini_batches: ‘int’ = 4, clip_param: ‘float’ = 0.2, gamma: ‘float’ = 0.99, lam: ‘float’ = 0.95, value_loss_coef: ‘float’ = 1.0, entropy_coef: ‘float’ = 0.01, learning_rate: ‘float’ = 0.001, max_grad_norm: ‘float’ = 1.0, use_clipped_value_loss: ‘bool’ = True, schedule: ‘str’ = ‘adaptive’, desired_kl: ‘float’ = 0.01, adaptive_kl_factor: ‘float’ = 1.2, adaptive_lr_factor: ‘float’ = 1.1, optimizer: ‘str’ = ‘adam’, tau: ‘float’ = 1.0, target_update_freq: ‘int’ = 1, vtrace_clip_rho: ‘float’ = 1.0, vtrace_clip_c: ‘float’ = 1.0, enable_compile: ‘bool’ = True)
- Parameters:
num_learning_epochs (
int)num_mini_batches (
int)clip_param (
float)gamma (
float)lam (
float)value_loss_coef (
float)entropy_coef (
float)learning_rate (
float)max_grad_norm (
float)use_clipped_value_loss (
bool)schedule (
str)desired_kl (
float)adaptive_kl_factor (
float)adaptive_lr_factor (
float)optimizer (
str)tau (
float)target_update_freq (
int)vtrace_clip_rho (
float)vtrace_clip_c (
float)enable_compile (
bool)
- __init__(num_learning_epochs=5, num_mini_batches=4, clip_param=0.2, gamma=0.99, lam=0.95, value_loss_coef=1.0, entropy_coef=0.01, learning_rate=0.001, max_grad_norm=1.0, use_clipped_value_loss=True, schedule='adaptive', desired_kl=0.01, adaptive_kl_factor=1.2, adaptive_lr_factor=1.1, optimizer='adam', tau=1.0, target_update_freq=1, vtrace_clip_rho=1.0, vtrace_clip_c=1.0, enable_compile=True)¶
- Parameters:
num_learning_epochs (
int)num_mini_batches (
int)clip_param (
float)gamma (
float)lam (
float)value_loss_coef (
float)entropy_coef (
float)learning_rate (
float)max_grad_norm (
float)use_clipped_value_loss (
bool)schedule (
str)desired_kl (
float)adaptive_kl_factor (
float)adaptive_lr_factor (
float)optimizer (
str)tau (
float)target_update_freq (
int)vtrace_clip_rho (
float)vtrace_clip_c (
float)enable_compile (
bool)
- class unilab.structured_configs.APPODistributionConfig[source]¶
Bases:
objectAPPODistributionConfig(class_name: ‘str’ = ‘rsl_rl.modules.distribution.GaussianDistribution’, init_std: ‘float’ = 1.0, std_type: ‘str’ = ‘scalar’)
- class unilab.structured_configs.APPOActorConfig[source]¶
Bases:
objectAPPOActorConfig(class_name: ‘str’ = ‘rsl_rl.models.MLPModel’, hidden_dims: ‘list’ = <factory>, activation: ‘str’ = ‘elu’, distribution_cfg: ‘APPODistributionConfig’ = <factory>)
- Parameters:
class_name (
str)hidden_dims (
list)activation (
str)distribution_cfg (
APPODistributionConfig)
-
distribution_cfg:
APPODistributionConfig¶
- __init__(class_name='rsl_rl.models.MLPModel', hidden_dims=<factory>, activation='elu', distribution_cfg=<factory>)¶
- Parameters:
class_name (
str)hidden_dims (
list)activation (
str)distribution_cfg (
APPODistributionConfig)
- class unilab.structured_configs.APPOCriticConfig[source]¶
Bases:
objectAPPOCriticConfig(class_name: ‘str’ = ‘rsl_rl.models.MLPModel’, hidden_dims: ‘list’ = <factory>, activation: ‘str’ = ‘elu’)
- class unilab.structured_configs.APPOConfig[source]¶
Bases:
BaseConfigAPPOConfig(algo: ‘str’ = ‘appo’, algo_log_name: ‘str’ = ‘appo’, seed: ‘int’ = 1, num_envs: ‘int’ = 2048, steps_per_env: ‘int’ = 24, max_iterations: ‘int’ = 150, save_interval: ‘int’ = 50, obs_groups: ‘dict’ = <factory>, actor: ‘APPOActorConfig’ = <factory>, critic: ‘APPOCriticConfig’ = <factory>, algorithm: ‘APPOAlgorithmConfig’ = <factory>)
- Parameters:
algo (
str)algo_log_name (
str)seed (
int)num_envs (
int)steps_per_env (
int)max_iterations (
int)save_interval (
int)obs_groups (
dict)actor (
APPOActorConfig)critic (
APPOCriticConfig)algorithm (
APPOAlgorithmConfig)
-
actor:
APPOActorConfig¶
-
critic:
APPOCriticConfig¶
-
algorithm:
APPOAlgorithmConfig¶
- __init__(algo='appo', algo_log_name='appo', seed=1, num_envs=2048, steps_per_env=24, max_iterations=150, save_interval=50, obs_groups=<factory>, actor=<factory>, critic=<factory>, algorithm=<factory>)¶
- Parameters:
algo (
str)algo_log_name (
str)seed (
int)num_envs (
int)steps_per_env (
int)max_iterations (
int)save_interval (
int)obs_groups (
dict)actor (
APPOActorConfig)critic (
APPOCriticConfig)algorithm (
APPOAlgorithmConfig)
- class unilab.structured_configs.PPOPolicyConfig[source]¶
Bases:
objectPPOPolicyConfig(init_noise_std: ‘float’ = 1.0, actor_hidden_dims: ‘list’ = <factory>, critic_hidden_dims: ‘list’ = <factory>, activation: ‘str’ = ‘elu’, class_name: ‘str’ = ‘ActorCritic’)
- Parameters:
- class unilab.structured_configs.PPOAlgorithmConfig[source]¶
Bases:
objectPPOAlgorithmConfig(class_name: ‘str’ = ‘unilab.algos.torch.rsl_rl_ppo:FinalObservationAwarePPO’, value_loss_coef: ‘float’ = 1.0, use_clipped_value_loss: ‘bool’ = True, clip_param: ‘float’ = 0.2, entropy_coef: ‘float’ = 0.01, num_learning_epochs: ‘int’ = 5, num_mini_batches: ‘int’ = 4, learning_rate: ‘float’ = 0.001, schedule: ‘str’ = ‘adaptive’, gamma: ‘float’ = 0.99, lam: ‘float’ = 0.95, desired_kl: ‘float’ = 0.01, target_kl_stop: ‘Optional[float]’ = None, max_grad_norm: ‘float’ = 1.0, adaptive_kl_beta: ‘float’ = 0.9, adaptive_lr_growth: ‘float’ = 1.1, adaptive_lr_decay: ‘float’ = 1.2, adaptive_lr_update_interval: ‘int’ = 5, metrics_interval: ‘int’ = 8, finite_check_interval: ‘int’ = 8, enable_compile: ‘bool’ = True, warmup_strict_iters: ‘int’ = 10, warmup_metrics_interval: ‘int’ = 2, warmup_finite_check_interval: ‘int’ = 2, disable_finite_checks: ‘bool’ = True)
- Parameters:
class_name (
str)value_loss_coef (
float)use_clipped_value_loss (
bool)clip_param (
float)entropy_coef (
float)num_learning_epochs (
int)num_mini_batches (
int)learning_rate (
float)schedule (
str)gamma (
float)lam (
float)desired_kl (
float)max_grad_norm (
float)adaptive_kl_beta (
float)adaptive_lr_growth (
float)adaptive_lr_decay (
float)adaptive_lr_update_interval (
int)metrics_interval (
int)finite_check_interval (
int)enable_compile (
bool)warmup_strict_iters (
int)warmup_metrics_interval (
int)warmup_finite_check_interval (
int)disable_finite_checks (
bool)
- __init__(class_name='unilab.algos.torch.rsl_rl_ppo:FinalObservationAwarePPO', value_loss_coef=1.0, use_clipped_value_loss=True, clip_param=0.2, entropy_coef=0.01, num_learning_epochs=5, num_mini_batches=4, learning_rate=0.001, schedule='adaptive', gamma=0.99, lam=0.95, desired_kl=0.01, target_kl_stop=None, max_grad_norm=1.0, adaptive_kl_beta=0.9, adaptive_lr_growth=1.1, adaptive_lr_decay=1.2, adaptive_lr_update_interval=5, metrics_interval=8, finite_check_interval=8, enable_compile=True, warmup_strict_iters=10, warmup_metrics_interval=2, warmup_finite_check_interval=2, disable_finite_checks=True)¶
- Parameters:
class_name (
str)value_loss_coef (
float)use_clipped_value_loss (
bool)clip_param (
float)entropy_coef (
float)num_learning_epochs (
int)num_mini_batches (
int)learning_rate (
float)schedule (
str)gamma (
float)lam (
float)desired_kl (
float)max_grad_norm (
float)adaptive_kl_beta (
float)adaptive_lr_growth (
float)adaptive_lr_decay (
float)adaptive_lr_update_interval (
int)metrics_interval (
int)finite_check_interval (
int)enable_compile (
bool)warmup_strict_iters (
int)warmup_metrics_interval (
int)warmup_finite_check_interval (
int)disable_finite_checks (
bool)
- class unilab.structured_configs.PPOConfig[source]¶
Bases:
BaseConfigPPOConfig(algo: ‘str’ = ‘ppo’, algo_log_name: ‘str’ = ‘rsl_rl_ppo’, seed: ‘int’ = 1, num_envs: ‘int’ = 4096, num_steps_per_env: ‘int’ = 24, max_iterations: ‘int’ = 101, save_interval: ‘int’ = 100, empirical_normalization: ‘bool’ = False, runner_class_name: ‘str’ = ‘OnPolicyRunner’, obs_groups: ‘dict’ = <factory>, experiment_name: ‘str’ = ‘test’, run_name: ‘str’ = ‘’, resume: ‘bool’ = False, load_run: ‘str’ = ‘-1’, checkpoint: ‘int’ = -1, resume_path: ‘Optional[str]’ = None, policy: ‘PPOPolicyConfig’ = <factory>, algorithm: ‘PPOAlgorithmConfig’ = <factory>)
- Parameters:
algo (
str)algo_log_name (
str)seed (
int)num_envs (
int)num_steps_per_env (
int)max_iterations (
int)save_interval (
int)empirical_normalization (
bool)runner_class_name (
str)obs_groups (
dict)experiment_name (
str)run_name (
str)resume (
bool)load_run (
str)checkpoint (
int)policy (
PPOPolicyConfig)algorithm (
PPOAlgorithmConfig)
-
policy:
PPOPolicyConfig¶
-
algorithm:
PPOAlgorithmConfig¶
- __init__(algo='ppo', algo_log_name='rsl_rl_ppo', seed=1, num_envs=4096, num_steps_per_env=24, max_iterations=101, save_interval=100, empirical_normalization=False, runner_class_name='OnPolicyRunner', obs_groups=<factory>, experiment_name='test', run_name='', resume=False, load_run='-1', checkpoint=-1, resume_path=None, policy=<factory>, algorithm=<factory>)¶
- Parameters:
algo (
str)algo_log_name (
str)seed (
int)num_envs (
int)num_steps_per_env (
int)max_iterations (
int)save_interval (
int)empirical_normalization (
bool)runner_class_name (
str)obs_groups (
dict)experiment_name (
str)run_name (
str)resume (
bool)load_run (
str)checkpoint (
int)policy (
PPOPolicyConfig)algorithm (
PPOAlgorithmConfig)
unilab.dtype_config¶
Global dtype configuration for environments.