vllm.model_executor.layers.fused_moe.fused_moe_method_base ¶
FusedMoEMethodBase ¶
Bases: QuantizeMethodBase
Source code in vllm/model_executor/layers/fused_moe/fused_moe_method_base.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 | |
__init__ ¶
__init__(moe: FusedMoEConfig)
apply ¶
apply(
layer: FusedMoE,
x: Tensor,
topk_weights: Tensor,
topk_ids: Tensor,
) -> Tensor | tuple[Tensor, Tensor]
Source code in vllm/model_executor/layers/fused_moe/fused_moe_method_base.py
apply_monolithic ¶
apply_monolithic(
layer: FusedMoE, x: Tensor, router_logits: Tensor
) -> Tensor | tuple[Tensor, Tensor]
Source code in vllm/model_executor/layers/fused_moe/fused_moe_method_base.py
create_weights abstractmethod ¶
create_weights(
layer: Module,
num_experts: int,
hidden_size: int,
intermediate_size_per_partition: int,
params_dtype: dtype,
**extra_weight_attrs,
)
Source code in vllm/model_executor/layers/fused_moe/fused_moe_method_base.py
get_fused_moe_quant_config abstractmethod ¶
get_fused_moe_quant_config(
layer: Module,
) -> FusedMoEQuantConfig | None
maybe_make_prepare_finalize ¶
maybe_make_prepare_finalize(
routing_tables: tuple[Tensor, Tensor, Tensor]
| None = None,
) -> FusedMoEPrepareAndFinalize | None
Source code in vllm/model_executor/layers/fused_moe/fused_moe_method_base.py
prepare_dp_allgather_tensor ¶
prepare_dp_allgather_tensor(
layer: FusedMoE,
hidden_states: Tensor,
router_logits: Tensor,
) -> tuple[Tensor, list[Tensor]]
Hook to prepare tensors and extra tensors for DP allgather + EP dispatch.
Source code in vllm/model_executor/layers/fused_moe/fused_moe_method_base.py
select_gemm_impl ¶
select_gemm_impl(
prepare_finalize: FusedMoEPrepareAndFinalize,
layer: Module,
) -> FusedMoEPermuteExpertsUnpermute
Source code in vllm/model_executor/layers/fused_moe/fused_moe_method_base.py
uses_weight_scale_2_pattern ¶
uses_weight_scale_2_pattern() -> bool
Returns True if this quantization method uses 'weight_scale_2' pattern for per-tensor weight scales (e.g., FP4 variants), False otherwise.
This method should be overridden by subclasses that use the 'weight_scale_2' pattern instead of the standard 'weight_scale' pattern.