[JAX] Fallback to old triton ffi for autotuned kernels by jberchtold-nvidia · Pull Request #3077 · NVIDIA/TransformerEngine

jberchtold-nvidia · 2026-06-02T16:16:01Z

Description

Disables new "triton_kernel_call_ffi" and falls back to old "triton_kernel_call" ffi due to CUDA IMA issues observed with autotuned kernels on new "triton_kernel_call_ffi"

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Fallback autotuned kernels to "triton_kernel_call"

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

greptile-apps · 2026-06-02T16:17:40Z

Greptile Summary

This PR fixes a CUDA IMA (Illegal Memory Access) issue by preventing autotuned kernels from using the newer triton_kernel_call_ffi custom call target, falling back to the older triton_kernel_call FFI instead.

Introduces a used_autotuned_launch boolean flag in triton_call_lowering; the flag is set to True only when a TritonAutotunedKernelCall is built, and gates the FFI selection so autotuned paths always use the legacy triton_kernel_call target.
Non-autotuned kernels and the pre-existing compatibility-fallback path (old JAX, is_triton_autotuned_alias_safe() returns False) are unaffected and continue to reach triton_kernel_call_ffi when the JAX version requirement is met.

Confidence Score: 5/5

Safe to merge — the change is narrowly scoped to routing autotuned kernels away from the new FFI target, and all other dispatch paths are unchanged.

The boolean flag cleanly separates the autotuned and non-autotuned code paths with no risk of misclassification. Non-autotuned kernels and the JAX-version-based compatibility fallback both leave used_autotuned_launch as False, preserving their existing behavior. The only trade-off is that autotuned kernels no longer benefit from CUDA graph support via triton_kernel_call_ffi, but that is the explicit intent of the fix given the CUDA IMA bug.

No files require special attention.

Important Files Changed

Filename	Overview
transformer_engine/jax/triton_extensions/utils.py	Adds `used_autotuned_launch` flag to route autotuned kernels away from `triton_kernel_call_ffi` and onto the legacy `triton_kernel_call` target; logic is correct and all non-autotuned paths are unaffected.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[triton_call_lowering] --> B{isinstance kernel_fn\nAutotuner?}
    B -- No --> C[is_autotuned = False\nused_autotuned_launch = False]
    B -- Yes --> D{is_triton_autotuned\n_alias_safe?}
    D -- No --> E[Compatibility fallback:\nis_autotuned = False\nused_autotuned_launch = False]
    D -- Yes --> F[is_autotuned = True]
    F --> G[Build TritonAutotunedKernelCall\nfor all configs]
    G --> H[used_autotuned_launch = True]
    C --> I{FFI selection}
    E --> I
    H --> I
    I --> J{not used_autotuned_launch\nAND jax_version_meets\nCUDA_GRAPH_MIN?}
    J -- Yes --> K[triton_kernel_call_ffi\nnew FFI - CUDA graph support]
    J -- No --> L[triton_kernel_call\nlegacy FFI - avoids CUDA IMA]

_{Reviews (2): Last reviewed commit: "[pre-commit.ci] auto fixes from pre-comm..." | Re-trigger Greptile}

greptile-apps · 2026-06-02T16:17:44Z

+    if (
+        not used_autotuned_launch
+        and jax_version_meet_requirement(TRITON_EXTENSION_CUDA_GRAPH_MIN_JAX_VERSION)
+    ):


The new branch condition does not carry a comment explaining why autotuned launches must skip the new FFI. Without it, a future reader will only see the mechanism (the flag) but not the root cause (CUDA IMA with triton_kernel_call_ffi on autotuned kernels), making it easy to inadvertently remove the guard when refactoring.

Suggested change

if (

not used_autotuned_launch

and jax_version_meet_requirement(TRITON_EXTENSION_CUDA_GRAPH_MIN_JAX_VERSION)

):

# Autotuned kernels must use the older "triton_kernel_call" FFI: the newer

# "triton_kernel_call_ffi" path triggers CUDA IMA (Illegal Memory Access)

# errors for autotuned kernels and must be bypassed until the upstream issue

# is resolved.

if (

not used_autotuned_launch

and jax_version_meet_requirement(TRITON_EXTENSION_CUDA_GRAPH_MIN_JAX_VERSION)

):

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

for more information, see https://pre-commit.ci

jberchtold-nvidia · 2026-06-02T16:19:22Z

/te-ci jax

tdophung

LGTM

Disable new triton ffi for autotuned kernels

3dfd40e

Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>

jberchtold-nvidia requested a review from tdophung June 2, 2026 16:16

greptile-apps Bot reviewed Jun 2, 2026

View reviewed changes

[pre-commit.ci] auto fixes from pre-commit.com hooks

f375ab1

for more information, see https://pre-commit.ci

tdophung approved these changes Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JAX] Fallback to old triton ffi for autotuned kernels#3077

[JAX] Fallback to old triton ffi for autotuned kernels#3077
jberchtold-nvidia wants to merge 2 commits into
NVIDIA:mainfrom
jberchtold-nvidia:jberchtold/disable-new-triton-ffi-for-autotuned-kernels

jberchtold-nvidia commented Jun 2, 2026

Uh oh!

greptile-apps Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Jun 2, 2026

Uh oh!

jberchtold-nvidia commented Jun 2, 2026

Uh oh!

tdophung left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jberchtold-nvidia commented Jun 2, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

jberchtold-nvidia commented Jun 2, 2026

Uh oh!

tdophung left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Jun 2, 2026 •

edited

Loading