Skip to content

Add Gemma 4 architecture support to TransformerBridge#1377

Open
punishell wants to merge 1 commit into
TransformerLensOrg:devfrom
punishell:gemma4-support
Open

Add Gemma 4 architecture support to TransformerBridge#1377
punishell wants to merge 1 commit into
TransformerLensOrg:devfrom
punishell:gemma4-support

Conversation

@punishell

Copy link
Copy Markdown

Description

Adds TransformerBridge support for Google's Gemma 4 family (released April 2026), which had no support in TransformerLens.

Fixes #1297

A single text-only adapter covers both architectures:

  • Gemma4ForConditionalGeneration — E2B / E4B / 31B / 26B-A4B
  • Gemma4UnifiedForConditionalGeneration — the encoder-free 12B (needs transformers >= 5.10)

Gemma 4 layers are heterogeneous, so the adapter delegates all math to HF and maps variant-specific submodules optional: KV-shared layers drop k/v projections, K==V layers drop v_proj, and Per-Layer-Embedding / MoE submodules appear only on some variants. Unlike Gemma 1-3, Gemma4RMSNorm has no (1 + weight) offset.

Adds DelegatedAttentionBlockBridge (drops the split-QKV fork aliases, mirroring MLABlockBridge) so hook-alias resolution stays clean when attention is delegated wholesale to HF.

google/gemma-4-E2B-it passes verify_models (P1 100%, P2 100%, P4 94.7%).

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have added tests that prove my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

Adds a text-only adapter covering both Gemma4ForConditionalGeneration
(E2B/E4B/31B/26B-A4B) and Gemma4UnifiedForConditionalGeneration (12B),
addressing TransformerLensOrg#1297.

Gemma 4 layers are heterogeneous: KV-shared layers drop k/v projections,
K==V layers drop v_proj, and per-layer-embedding / MoE submodules appear
only on some variants -- all mapped optional and delegated to HF. Unlike
Gemma 1-3, Gemma4RMSNorm has no (1+weight) offset.

Adds DelegatedAttentionBlockBridge (drops the split-QKV fork aliases, as
MLABlockBridge does) so hook-alias resolution stays clean when attention
is delegated wholesale to HF.

google/gemma-4-E2B-it passes verification (P1 100%, P2 100%, P4 94.7%).

- New adapter + four-place registration + gemma4/gemma4_unified model_type mappings
- 10 checkpoints added to the model registry
- Unit + integration tests (logit parity vs HF on all three structural variants)
@jlarson4

Copy link
Copy Markdown
Collaborator

@punishell We do have a different contributor actively working on this already. Once his implementation is ready I'll review both and determine which is correct for TransformerLens.

We will also want full multimodal support, not just text only (See Gemma3ForConditional's architecture adapter for details on how that works)

@punishell

punishell commented Jun 10, 2026 via email

Copy link
Copy Markdown
Author

@jlarson4

Copy link
Copy Markdown
Collaborator

@punishell Oh that's wonderful! Glad to hear that it's working for you, and thank you for using TransformerLens!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants