refactor Flux transformer to use scanned blocks, dynamic checkpointing, and decoupled projections by prishajain1 · Pull Request #417 · AI-Hypercomputer/maxdiffusion

prishajain1 · 2026-06-12T06:20:03Z

Overview

This PR refactors the Flux model architecture in MaxDiffusion to support scanned blocks (nn.scan) for double and single blocks, implements configurable gradient checkpointing (rematerialization) policies, and updates the weights loader to support loading pretrained checkpoints under the scanned format.

Key Changes

Decoupled Fused Projections: Decoupled the projection layers (implementing the MlpAndOutputBlock wrapper) to eliminate redundant recomputation of attention and projection outputs.
QKV Slicing Refactoring: Refactored the QKV projection slicing logic to use jnp.split across Flux transformer blocks for cleaner layout constraints.
Scanned Block Architecture: Migrated Flux Double and Single Transformer Blocks to use nn.scan to optimize compiler tracing and step execution speed on TPUs.
Dynamic Gradient Checkpointing: Added FLUX_OPTIMIZED to GradientCheckpointType to allow configuring block-specific rematerialization policies dynamically via configuration files instead of being hardcoded.
Stacked Weights Loading: Updated the weights loader (util.py) to slice, group, and stack PyTorch checkpoint weights along axis 0 to match the expected format of nn.scan layers.

github-actions · 2026-06-12T06:20:12Z

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

github-actions · 2026-06-12T06:32:23Z

🤖 Hi @prishajain1, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-06-12T06:35:20Z

🤖 I'm sorry @prishajain1, but I was unable to process your request. Please see the logs for more details.

…ng, and weight loading improvements

github-actions · 2026-06-13T15:07:18Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-06-13T17:02:39Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-06-13T17:09:00Z

🤖 I'm sorry @Perseus14, but I was unable to process your request. Please see the logs for more details.

prishajain1 requested a review from entrpn as a code owner June 12, 2026 06:20

prishajain1 marked this pull request as draft June 12, 2026 06:20

prishajain1 force-pushed the prisha/flux_training branch 2 times, most recently from 4696256 to 11ddfef Compare June 12, 2026 06:29

prishajain1 marked this pull request as ready for review June 12, 2026 06:31

prishajain1 added the gemini-review label Jun 12, 2026

prishajain1 removed the gemini-review label Jun 12, 2026

prishajain1 requested a review from Perseus14 June 12, 2026 08:59

Perseus14 requested changes Jun 12, 2026

View reviewed changes

prishajain1 force-pushed the prisha/flux_training branch 4 times, most recently from 8c8dcec to f58fb9e Compare June 13, 2026 13:39

Flux training: Implement scanned blocks, dynamic gradient checkpointi…

b53d7d2

…ng, and weight loading improvements

prishajain1 force-pushed the prisha/flux_training branch from f58fb9e to b53d7d2 Compare June 13, 2026 13:40

prishajain1 requested a review from Perseus14 June 13, 2026 13:42

Perseus14 assigned prishajain1 Jun 13, 2026

Perseus14 added the gemini-review label Jun 13, 2026

Perseus14 added gemini-review and removed gemini-review labels Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor Flux transformer to use scanned blocks, dynamic checkpointing, and decoupled projections#417

refactor Flux transformer to use scanned blocks, dynamic checkpointing, and decoupled projections#417
prishajain1 wants to merge 1 commit into
mainfrom
prisha/flux_training

prishajain1 commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

prishajain1 commented Jun 12, 2026

Overview

Key Changes

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants