[Common] Use specialized unfused MXFP8 cast kernels by default#2958
[Common] Use specialized unfused MXFP8 cast kernels by default#2958Oleg-Goncharov wants to merge 11 commits into
Conversation
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
Greptile SummaryThis PR promotes the specialized unfused MXFP8 cast kernels from opt-in (via
Confidence Score: 5/5Safe to merge — the specialized kernels are guarded by correct runtime eligibility checks before being invoked, and error detection is strengthened throughout. The behavioral change is narrow and well-defended: the No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[quantize called] --> B{hasSpec AND\nnot swizzled scales?}
B -- No --> G[Generic kernel path]
B -- Yes --> C{scaling_type_has_specialized_support?}
C -- No --> G
C -- Yes --> D{scaling_type?}
D -- ROWWISE\ncols%128==0 AND\ngrid fits --> E[specialized rowwise\ncast-only kernel]
D -- BIDIMENSIONAL\ngrid fits --> F[specialized bidimensional\ncast-only kernel]
E --> CUDA_CHECK[NVTE_CHECK_CUDA\ncudaGetLastError]
F --> CUDA_CHECK
CUDA_CHECK --> RETURN[return]
G --> SW{scaling_type?}
SW -- ROWWISE --> GR[generic ROWWISE kernel]
SW -- COLWISE --> GC[generic COLWISE kernel]
SW -- BIDIMENSIONAL --> GB[generic BIDIMENSIONAL kernel]
GR & GC & GB --> GE[NVTE_CHECK_CUDA\ncudaGetLastError]
Reviews (6): Last reviewed commit: "Merge branch 'main' into pr_fast_default..." | Re-trigger Greptile |
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
|
/te-ci |
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
|
/te-ci |
|
/te-ci |
Signed-off-by: Oleg Goncharov <ogoncharov@nvidia.com>
for more information, see https://pre-commit.ci
|
/te-ci |
|
For the future work we could think about doing the swizzling support for that kernel, but not sure how needed it really is. |
|
/te-ci |
Description
This PR enables the fast unfused MXFP8 cast kernels by default.
Previously, these kernels were gated behind an environment variable and therefore were not used unless explicitly enabled. This change makes the specialized cast-only path the default behavior.
Type of change
Changes
Checklist: