vllm.model_executor.model_loader.reload.meta ¶
SKIP_TENSORS module-attribute ¶
SKIP_TENSORS: set[str] = {
"_expert_map",
"expert_mask",
"expert_global_to_physical",
"expert_physical_to_global",
"expert_local_to_global",
}
__all__ module-attribute ¶
__all__ = [
"to_meta_tensor",
"materialize_meta_tensor",
"capture_layer_to_meta",
"restore_layer_on_meta",
"materialize_layer",
"get_numel_loaded",
]
MetaCopyCounter ¶
Bases: TorchDispatchMode
Tracks total number of elements modified with copy_.
Useful for keeping track of weight loading where underlying weights can be arbitrarily transformed (such as with narrow) before calling copy.
Note: Assumes that copy kwargs are not used.
Source code in vllm/model_executor/model_loader/reload/meta.py
capture_layer_to_meta ¶
capture_layer_to_meta(layer: Module) -> LayerTensors
Source code in vllm/model_executor/model_loader/reload/meta.py
get_numel_loaded ¶
get_numel_loaded(
weight_loader: Callable, args: BoundArguments
) -> tuple[int, object]
Determine how many elements would be loaded by a weight loader call.
:param weight loader: used to load weights :param args: bound arguments to weight loader :return: number of elements loaded by the weight loader, the return value of the weight loader
Source code in vllm/model_executor/model_loader/reload/meta.py
materialize_layer ¶
materialize_layer(layer: Module) -> None
Materialize all meta tensors in a layer to actual tensors.
Source code in vllm/model_executor/model_loader/reload/meta.py
materialize_meta_tensor ¶
Materialize a meta tensor into an actual tensor on the current device. Should be called within the torch device context for the given rank.
Source code in vllm/model_executor/model_loader/reload/meta.py
restore_layer_on_meta ¶
restore_layer_on_meta(
layer: Module, info: LayerReloadingInfo
)
Restore a layer to model format with tensors on the meta device
Source code in vllm/model_executor/model_loader/reload/meta.py
to_meta_tensor ¶
Convert a tensor to a meta tensor while preserving class and attributes.