vllm.transformers_utils.configs.kimi_k25 ¶
Kimi-K2.5 Model Configuration.
This configuration supports video-chunk as an internal modality type. A video-chunk is the smallest independently processable unit of video.
KimiK25Config ¶
Bases: PretrainedConfig
Kimi-K2.5 model configuration.
Kimi-K2.5 extends Kimi-K2 with vision support using video-chunks. A video-chunk consists of multiple consecutive frames that are processed together with temporal pooling.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vision_config | dict | KimiK25VisionConfig | None | Configuration for the vision tower and projector. | None |
text_config | dict | DeepseekV3Config | None | Configuration for the text model (DeepseekV3). | None |
ignore_index | int | The ignore index for the loss function. | -100 |
media_placeholder_token_id | int | The token ID for media placeholders. | 163605 |
pad_token_id | int | The token ID for padding. | 0 |
Source code in vllm/transformers_utils/configs/kimi_k25.py
media_placeholder_token_id instance-attribute ¶
__init__ ¶
__init__(
vision_config: dict | KimiK25VisionConfig | None = None,
text_config: dict | DeepseekV3Config | None = None,
ignore_index: int = -100,
media_placeholder_token_id: int = 163605,
pad_token_id: int = 0,
use_unified_vision_chunk: bool = False,
video_placeholder: str = "<|kimi_k25_video_placeholder|>",
**kwargs,
)
Source code in vllm/transformers_utils/configs/kimi_k25.py
KimiK25VisionConfig ¶
Bases: PretrainedConfig
Source code in vllm/transformers_utils/configs/kimi_k25.py
__init__ ¶
__init__(
patch_size: int = 14,
init_pos_emb_height: int = 64,
init_pos_emb_width: int = 64,
init_pos_emb_time: int = 4,
pos_emb_type: str = "divided_fixed",
num_attention_heads: int = 16,
num_hidden_layers: int = 27,
hidden_size: int = 1152,
intermediate_size: int = 4304,
merge_kernel_size: tuple[int, int] = (2, 2),
video_attn_type: str = "spatial_temporal",
merge_type: str = "sd2_tpool",
mm_projector_type: str = "patchmerger",
mm_hidden_size: int | None = None,
projector_hidden_act: str = "gelu",
projector_ln_eps: float = 1e-05,
**kwargs,
)