vllm.entrypoints.serve.disagg.protocol ¶
GenerateRequest ¶
Bases: BaseModel
Source code in vllm/entrypoints/serve/disagg/protocol.py
cache_salt class-attribute instance-attribute ¶
cache_salt: str | None = Field(
default=None,
description="If specified, the prefix cache will be salted with the provided string to prevent an attacker to guess prompts in multi-user environments. The salt should be random, protected from access by 3rd parties, and long enough to be unpredictable (e.g., 43 characters base64-encoded, corresponding to 256 bit).",
)
features class-attribute instance-attribute ¶
features: str | None = None
The processed MM inputs for the model.
kv_transfer_params class-attribute instance-attribute ¶
kv_transfer_params: dict[str, Any] | None = Field(
default=None,
description="KVTransfer parameters used for disaggregated serving.",
)
priority class-attribute instance-attribute ¶
priority: int = Field(
default=0,
description="The priority of the request (lower means earlier handling; default: 0). Any priority other than 0 will raise an error if the served model does not use priority scheduling.",
)
request_id class-attribute instance-attribute ¶
request_id: str = Field(
default_factory=lambda: f"{random_uuid()}",
description="The request_id related to this request. If the caller does not set it, a random_uuid will be generated. This id is used through out the inference process and return in response.",
)
sampling_params instance-attribute ¶
sampling_params: SamplingParams
The sampling parameters for the model.
GenerateResponse ¶
Bases: BaseModel
Source code in vllm/entrypoints/serve/disagg/protocol.py
kv_transfer_params class-attribute instance-attribute ¶
kv_transfer_params: dict[str, Any] | None = Field(
default=None,
description="KVTransfer parameters used for disaggregated serving.",
)
prompt_logprobs class-attribute instance-attribute ¶
request_id class-attribute instance-attribute ¶
request_id: str = Field(
default_factory=lambda: f"{random_uuid()}",
description="The request_id related to this request. If the caller does not set it, a random_uuid will be generated. This id is used through out the inference process and return in response.",
)
GenerateResponseChoice ¶
Bases: BaseModel