vllm.model_executor.models.kimi_k25 ¶
Kimi-K2.5 Model Implementation for vLLM.
Kimi-K2.5 extends Kimi-K2 with vision support
This module defines: - KimiK25ProcessingInfo/KimiK25MultiModalProcessor: Processing logic - KimiK25ForConditionalGeneration: Main model class
KimiK25DummyInputsBuilder ¶
Bases: BaseDummyInputsBuilder[KimiK25ProcessingInfo]
Builds dummy inputs for Kimi-K2.5 model profiling.
Source code in vllm/model_executor/models/kimi_k25.py
__init__ ¶
__init__(info: KimiK25ProcessingInfo) -> None
get_dummy_mm_data ¶
get_dummy_mm_data(
seq_len: int,
mm_counts: Mapping[str, int],
mm_options: Mapping[str, BaseDummyOptions]
| None = None,
) -> MultiModalDataDict
Source code in vllm/model_executor/models/kimi_k25.py
get_dummy_mm_items ¶
Source code in vllm/model_executor/models/kimi_k25.py
KimiK25ForConditionalGeneration ¶
Bases: Module, SupportsMultiModal, SupportsPP
Kimi-K2.5 model for conditional generation.
Supports both image and video-chunk modalities. Video-chunks are temporal segments (typically 4 frames) that are processed with temporal pooling.
Source code in vllm/model_executor/models/kimi_k25.py
300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 | |
language_model instance-attribute ¶
language_model = init_vllm_registered_model(
vllm_config=vllm_config,
hf_config=text_config,
prefix=maybe_prefix(prefix, "language_model"),
architectures=["DeepseekV2ForCausalLM"],
)
make_empty_intermediate_tensors instance-attribute ¶
weights_mapper class-attribute instance-attribute ¶
weights_mapper = WeightsMapper(
orig_to_new_prefix={
"mm_projector.proj.0": "mm_projector.linear_1",
"mm_projector.proj.2": "mm_projector.linear_2",
}
)
__init__ ¶
__init__(vllm_config: VllmConfig, prefix: str = '') -> None
Source code in vllm/model_executor/models/kimi_k25.py
_parse_and_validate_media_input ¶
_parse_and_validate_media_input(
**kwargs: object,
) -> KimiK25MediaPixelInputs | None
Source code in vllm/model_executor/models/kimi_k25.py
_process_media_input ¶
_process_media_input(
media_input: KimiK25MediaPixelInputs,
) -> list[Tensor]
Source code in vllm/model_executor/models/kimi_k25.py
compute_logits ¶
embed_multimodal ¶
embed_multimodal(**kwargs: object) -> NestedTensors | None
Source code in vllm/model_executor/models/kimi_k25.py
forward ¶
forward(
input_ids: Tensor,
positions: Tensor,
intermediate_tensors: IntermediateTensors | None = None,
inputs_embeds: Tensor | None = None,
**kwargs: object,
) -> IntermediateTensors
Source code in vllm/model_executor/models/kimi_k25.py
get_placeholder_str classmethod ¶
Source code in vllm/model_executor/models/kimi_k25.py
KimiK25MediaPixelInputs ¶
Bases: TensorSchema
Media input schema for K2-VL model.
Dimensions
- np: Number of patches (flattened from all media items)
- ps: Patch size
- nm: Number of media items
Source code in vllm/model_executor/models/kimi_k25.py
KimiK25MultiModalProcessor ¶
Bases: BaseMultiModalProcessor[KimiK25ProcessingInfo]
Multi-modal processor for Kimi-K2.5.
Handles both image and video-chunk modalities.
Source code in vllm/model_executor/models/kimi_k25.py
_get_mm_fields_config ¶
_get_mm_fields_config(
hf_inputs: BatchFeature,
hf_processor_mm_kwargs: Mapping[str, object],
) -> Mapping[str, MultiModalFieldConfig]
Indicates how to slice media input into multiple items.
pixel_values: [N, 3, patch_size, patch_size], all patches collected from B medias grid_thws: [B,3], each item: [N_t, N_h ,N_w], indicates the grid size in time/height/width direction for current item.
by multiplying [N_t, N_h ,N_w], we get the number of patches for each media item, thus we can slice pixel_values by pixel_values[start:start + N_tN_hN_w] to get patches of one item.
Source code in vllm/model_executor/models/kimi_k25.py
_get_prompt_updates ¶
_get_prompt_updates(
mm_items: MultiModalDataItems,
hf_processor_mm_kwargs: Mapping[str, Any],
out_mm_kwargs: MultiModalKwargsItems,
) -> Sequence[PromptUpdate]
Source code in vllm/model_executor/models/kimi_k25.py
KimiK25ProcessingInfo ¶
Bases: BaseProcessingInfo
Processing information for Kimi-K2.5 model.
Provides configuration and utilities for processing both images and video-chunks.
Source code in vllm/model_executor/models/kimi_k25.py
hf_processor instance-attribute ¶
hf_processor = MoonshotKimiVAutoProcessor(
media_processor=media_processor,
tokenizer=get_tokenizer(),
media_token_id=media_token_id,
)
__init__ ¶
__init__(ctx: InputProcessingContext) -> None
Source code in vllm/model_executor/models/kimi_k25.py
get_hf_config ¶
get_hf_processor ¶
MaxImageTokenMeta dataclass ¶
Source code in vllm/model_executor/models/kimi_k25.py
MoonshotKimiVAutoProcessor ¶
Bases: ProcessorMixin
Source code in vllm/model_executor/models/kimi_k25.py
__call__ ¶
__call__(
vision_chunks: list[VisionChunk] | None = None,
*,
text: list[int] | str,
**kwargs,
) -> BatchFeature
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vision_chunks | list[VisionChunk] | None | List of VisionChunk items to be processed. For image: VisionChunkImage with type='image', image=PIL.Image For video_chunk: VisionChunkVideo with type='video_chunk', video_chunk=list[PIL.Image] | None |
text | list[int] | str | The token ids to be fed to a model (required). | required |
Returns: [BatchFeature]: A [BatchFeature] with the following fields:
- **input_ids** -- list of token ids to be fed to a model.
- **pixel_values** -- Pixel values to be fed to a model. Returned when `vision_chunks` is not `None`.
- **grid_thws** -- list of image 3D grid in LLM. Returned when `vision_chunks` is not `None`.