unilab.algos.torch.offpolicy.worker¶
Off-policy collector for SAC and TD3.
Collects (obs, action, reward, next_obs, done) transitions using the current actor policy. Runs in a subprocess; writes to ReplayBuffer.
Functions
|
Entry point for the off-policy collector subprocess. |
|
Resolve actor dims for the collector. |
Resolve optional collector-side actor context for privileged off-policy actors. |
|
|
Sample collector actions using the algorithm's exploration policy. |
- unilab.algos.torch.offpolicy.worker.resolve_collector_actor_dims(env, obs_dim=None, action_dim=None)[source]¶
Resolve actor dims for the collector.
Prefer explicit dims from the parent process so learner and collector build identical actor shapes on override-heavy env paths.
- unilab.algos.torch.offpolicy.worker.sample_offpolicy_actions(actor, algo_type, obs_torch, prev_dones_torch, priv_info_torch=None)[source]¶
Sample collector actions using the algorithm’s exploration policy.
- unilab.algos.torch.offpolicy.worker.resolve_offpolicy_actor_priv_info(*, algo_type, obs_np, critic_np, info)[source]¶
Resolve optional collector-side actor context for privileged off-policy actors.
- unilab.algos.torch.offpolicy.worker.off_policy_collector_fn(stop_event, env_name, num_envs, replay_buffer, weight_sync_name, weight_param_shapes, algo_type='sac', actor_hidden_dim=512, use_layer_norm=True, learning_starts=0, metrics_queue=None, weight_sync_lock=None, sync_collection=False, collection_ready_queue=None, trainer_done_queue=None, env_steps_per_sync=1, obs_normalization=False, shared_obs_normalizer_stats=None, sim_backend='mujoco', env_cfg_override=None, obs_dim=None, action_dim=None, actor_kwargs=None, seed=None, trace_enabled=False, trace_thread_time=False, collector_pack_request_queue=None, collector_pack_ready_queue=None, collector_pack_shared_slots=None, **kwargs)[source]¶
Entry point for the off-policy collector subprocess.
Error handling is provided by
_collector_entry_wrapperinasync_runner.py.- Parameters: