vllm.v1.worker.dp_utils ¶
_get_device_and_group ¶
_get_device_and_group(parallel_config: ParallelConfig)
Source code in vllm/v1/worker/dp_utils.py
_post_process_dp_padding ¶
Source code in vllm/v1/worker/dp_utils.py
_post_process_ubatch ¶
Source code in vllm/v1/worker/dp_utils.py
_run_ar ¶
_run_ar(
should_ubatch: bool,
should_dp_pad: bool,
orig_num_tokens_per_ubatch: int,
padded_num_tokens_per_ubatch: int,
parallel_config: ParallelConfig,
) -> Tensor
Source code in vllm/v1/worker/dp_utils.py
_synchronize_dp_ranks ¶
_synchronize_dp_ranks(
num_tokens_unpadded: int,
num_tokens_padded: int,
should_attempt_ubatching: bool,
should_attempt_dp_padding: bool,
parallel_config: ParallelConfig,
) -> tuple[bool, Tensor | None]
-
Decides if each DP rank is going to microbatch. Either all ranks run with microbatching or none of them do.
-
Determines the total number of tokens that each rank will run. When running microbatched or if should_attempt_dp_padding is True, all ranks will be padded out so that the run with the same number of tokens
tuple[
| Name | Type | Description |
|---|---|---|
should_ubatch | bool | Are all DP ranks going to microbatch |
num_tokens_after_padding | Tensor | None | A tensor containing the total number of |
tuple[bool, Tensor | None] | tokens per-microbatch for each DP rank including any DP padding. |
]
Source code in vllm/v1/worker/dp_utils.py
coordinate_batch_across_dp ¶
coordinate_batch_across_dp(
num_tokens_unpadded: int,
allow_microbatching: bool,
allow_dp_padding: bool,
parallel_config: ParallelConfig,
num_tokens_padded: int | None = None,
uniform_decode: bool | None = None,
num_scheduled_tokens_per_request: ndarray | None = None,
) -> tuple[bool, Tensor | None]
Coordinates amongst all DP ranks to determine if and how the full batch should be split into microbatches.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_tokens_unpadded | int | Number of tokens without accounting for padding | required |
allow_microbatching | bool | If microbatching should be attempted | required |
allow_dp_padding | bool | If all DP ranks should be padded up to the same value | required |
parallel_config | ParallelConfig | The parallel config | required |
num_tokens_padded | int | None | Number of tokens including any non-DP padding (CUDA graphs, TP, etc) | None |
uniform_decode | bool | None | Only used if allow_microbatching is True. True if the batch only contains single token decodes | None |
num_scheduled_tokens_per_request | ndarray | None | Only used if allow_microbatching is True. The number of tokens per request. | None |
tuple[
| Name | Type | Description |
|---|---|---|
ubatch_slices | bool | if this is set then all DP ranks have agreed to |
Tensor | None | microbatch | |
num_tokens_after_padding | tuple[bool, Tensor | None] | A tensor containing the total number of |
tuple[bool, Tensor | None] | tokens per-microbatch for each DP rank including padding. Will be | |
tuple[bool, Tensor | None] | padded up to the max value across all DP ranks when allow_dp_padding | |
tuple[bool, Tensor | None] | is True. |
]