[DeepSeek-V4] Implement model integration, decoders, and configuration stack by parambole · Pull Request #4153 · AI-Hypercomputer/maxtext

parambole · 2026-06-11T23:40:54Z

Description

This PR introduces native architectural and routing support for the DeepSeek V4 model in MaxText.

Why & What: DeepSeek V4 introduces non-uniform architectural features that require explicit configuration unrolling. This PR solves the integration by implementing:

Compressed Attention (CSA/HCA): Bypasses standard MLA instantiation and natively integrates DeepSeek V4's alternating CSA and HCA attention blocks.
Hybrid Routing: Implements DeepSeek's transition from fixed Hash Routing (early layers) to learned Token Routing (later layers) natively within the MoE framework.
Architectural Scanning: Unrolls the 44-layer configuration to properly handle the [0, 0] prefix compression ratios, the perfectly alternating [4, 128] scanned middle layers, and the [4, 0] suffix layers.

Tests

Unit Tests: Verified mathematical parity against reference implementations using tests/unit/deepseek_v4_vs_reference_test.py.
E2E Compilation: Successfully compiled the full DeepSeek V4 model on a simulated v5p-512 mesh to guarantee memory constraints and HLO generation.

Compile Command to Reproduce:

python3  -m  maxtext.trainers.pre_train.train_compile  src/maxtext/configs/base.yml
  base_output_directory=/tmp/maxtext_logs
  run_name=dsv4_v5p512_compile
  per_device_batch_size=1
  enable_checkpointing=false
  model_name=deepseek4
  compile_topology=v5p-512
  compile_topology_num_slices=1
  ici_fsdp_parallelism=-1
  steps=1
  max_target_length=4096
  async_checkpointing=false
  tokenizer_type=huggingface
  tokenizer_path=deepseek-ai/DeepSeek-V3
  attention=dot_product
  dtype=bfloat16
  weight_dtype=bfloat16
  megablox=False
  sparse_matmul=False
  dataset_type=synthetic
  scan_layers=true

Proof of Compilation:

Memory analysis: CompiledMemoryStats(generated_code_size_in_bytes=260855808, argument_size_in_bytes=18401962496, output_size_in_bytes=18401889280, alias_size_in_bytes=18401880576, temp_size_in_bytes=94892786400, host_generated_code_size_in_bytes=0, host_argument_size_in_bytes=0, host_output_size_in_bytes=0, host_alias_size_in_bytes=0, host_temp_size_in_bytes=0)

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-06-11T23:45:17Z

Codecov Report

❌ Patch coverage is 29.03226% with 66 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/models/deepseek4.py	39.28%	34 Missing ⚠️
src/maxtext/layers/decoders.py	4.34%	19 Missing and 3 partials ⚠️
src/maxtext/layers/moe.py	14.28%	6 Missing ⚠️
src/maxtext/layers/attentions.py	0.00%	1 Missing and 1 partial ⚠️
src/maxtext/layers/attention_compressed.py	0.00%	1 Missing ⚠️
src/maxtext/models/deepseek.py	66.66%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

github-actions · 2026-06-12T20:12:30Z

🤖 Hi @parambole, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-06-12T20:14:59Z

🤖 I'm sorry @parambole, but I was unable to process your request. Please see the logs for more details.

This commit introduces full support for DeepSeek V4 by integrating its compressed attention mechanisms, MoE routing, and architectural layers. Key changes: - Add `deepseek4.yml` configuration and `DeepSeek4DecoderLayer` implementation. - Implement hybrid Hash Routing and Token Routing for MoE layers. - Add prefix/suffix layer unrolling for non-uniform compression blocks. - Fix Pydantic validation for base MLP dimensions. - Bypass MLA instantiation in favor of native CompressedAttention (CSA/HCA).

entrpn

just one comment, everything else looks good.

RissyRan · 2026-06-14T13:29:35Z

Are you able to have a real run and check profile to see if the scan blocks order as expected? Compile test won't be able to verify a RunTime error.

parambole force-pushed the dsv4_model_integrate branch 2 times, most recently from 2a19018 to 23adce0 Compare June 12, 2026 20:00

parambole marked this pull request as ready for review June 12, 2026 20:09

parambole changed the title ~~Add DeepSeek V4 architecture support~~ [DeepSeek-V4] Implement model integration, decoders, and configuration stack Jun 12, 2026

parambole added the gemini-review label Jun 12, 2026

dipakg-lang reviewed Jun 12, 2026

View reviewed changes

Comment thread src/maxtext/configs/models/deepseek4.yml Outdated

parambole force-pushed the dsv4_model_integrate branch from 23adce0 to 6deaacc Compare June 12, 2026 21:17

entrpn reviewed Jun 12, 2026

View reviewed changes

Comment thread src/maxtext/configs/models/deepseek4.yml

entrpn reviewed Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DeepSeek-V4] Implement model integration, decoders, and configuration stack#4153

[DeepSeek-V4] Implement model integration, decoders, and configuration stack#4153
parambole wants to merge 1 commit into
mainfrom
dsv4_model_integrate

parambole commented Jun 11, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

entrpn left a comment

Uh oh!

RissyRan commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

parambole commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Checklist

Uh oh!

codecov Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

entrpn left a comment

Choose a reason for hiding this comment

Uh oh!

RissyRan commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

parambole commented Jun 11, 2026 •

edited

Loading

codecov Bot commented Jun 11, 2026 •

edited

Loading