vllm.multimodal.utils ¶
__getattr__ ¶
__getattr__(name: str)
Source code in vllm/multimodal/utils.py
argsort_mm_positions ¶
argsort_mm_positions(
mm_positions: MultiModalPlaceholderDict,
) -> list[tuple[str, int]]
Given a MultiModalPlaceholderDict, output a sequence of keys to sort the dictionary by offset (starting index in the input sequence) in ascending order.
Returns:
| Type | Description |
|---|---|
list[tuple[str, int]] | A list of |
list[tuple[str, int]] | by |
Source code in vllm/multimodal/utils.py
encode_audio_base64 ¶
Encode audio as base64.
encode_audio_url ¶
Encode audio as a data URL.
Source code in vllm/multimodal/utils.py
encode_image_base64 ¶
Encode a pillow image to base64 format.
By default, the image is converted into RGB format before being encoded.
Source code in vllm/multimodal/utils.py
encode_image_url ¶
Encode a pillow image as a data URL.
By default, the image is converted into RGB format before being encoded.
Source code in vllm/multimodal/utils.py
encode_video_base64 ¶
encode_video_url ¶
Source code in vllm/multimodal/utils.py
fetch_audio ¶
fetch_audio(
audio_url: str,
audio_io_kwargs: dict[str, Any] | None = None,
) -> tuple[ndarray, int | float]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio_url | str | URL of the audio file to fetch. | required |
audio_io_kwargs | dict[str, Any] | None | Additional kwargs passed to handle audio IO. | None |
Warning
This method has direct access to local files and is only intended to be called by user code. Never call this from the online server!
Source code in vllm/multimodal/utils.py
fetch_image ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_url | str | URL of the image file to fetch. | required |
image_io_kwargs | dict[str, Any] | None | Additional kwargs passed to handle image IO. | None |
Warning
This method has direct access to local files and is only intended to be called by user code. Never call this from the online server!
Source code in vllm/multimodal/utils.py
fetch_video ¶
fetch_video(
video_url: str,
video_io_kwargs: dict[str, Any] | None = None,
) -> tuple[NDArray, dict[str, Any]]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video_url | str | URL of the video file to fetch. | required |
video_io_kwargs | dict[str, Any] | None | Additional kwargs passed to handle video IO. | None |
Warning
This method has direct access to local files and is only intended to be called by user code. Never call this from the online server!
Source code in vllm/multimodal/utils.py
group_mm_kwargs_by_modality ¶
group_mm_kwargs_by_modality(
mm_kwargs: list[tuple[str, MultiModalKwargsItem]],
*,
device: Device = None,
pin_memory: bool = False,
) -> Generator[
tuple[str, int, BatchedTensorInputs], None, None
]
Group consecutive MultiModalKwargsItems from mm_kwargs with the same modality together into the same MultiModalKwargs instance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mm_kwargs | list[tuple[str, MultiModalKwargsItem]] | List of | required |
device | Device | The device to place the grouped tensors on. | None |
pin_memory | bool | Whether to pin memory for faster host-to-device transfer. | False |
Yields:
| Type | Description |
|---|---|
tuple[str, int, BatchedTensorInputs] | A tuple |