Skip to content

fix(megatron): handle MambaMixer conv1d refactor in importer/exporter#1730

Open
AAnoosheh wants to merge 1 commit into
mainfrom
aanoosheh/mamba-mixer-conv1d-compat
Open

fix(megatron): handle MambaMixer conv1d refactor in importer/exporter#1730
AAnoosheh wants to merge 1 commit into
mainfrom
aanoosheh/mamba-mixer-conv1d-compat

Conversation

@AAnoosheh

@AAnoosheh AAnoosheh commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Summary

  • MambaMixer in some Megatron-LM branches replaced the nn.Conv1d submodule (self.conv1d) with raw nn.Parameters (self.conv1d_weight / self.conv1d_bias)
  • Add hasattr(layer.mixer, "conv1d") fallback in both the import path (megatron_importer.py) and export path (unified_export_megatron.py)
  • Add corresponding conv1d_weight / conv1d_bias rules to mcore_nemotron.py for both import and export mappings

Backward compatible: old Megatron-LM versions with nn.Conv1d hit the existing if branch unchanged. The else branch only fires when conv1d is absent.

Test plan

  • Verified quantize step completes end-to-end on Nemotron-3-Nano-30B-A3B with refactored MambaMixer branch
  • Old Megatron-LM versions unaffected (fallback via hasattr)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Improved Mamba convolution parameter handling during model import and export, including support for alternative mixer layouts.
    • Added explicit Hugging Face ↔ Megatron Core mappings for Nemotron Mamba conv1d_weight and conv1d_bias, ensuring consistent state-dict key generation across import/export.

@AAnoosheh AAnoosheh requested a review from a team as a code owner June 15, 2026 16:17
@AAnoosheh AAnoosheh requested a review from cjluo-nv June 15, 2026 16:17
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds dual-layout support for Mamba layer conv1d parameters in both export (_get_mamba_layer_state_dict) and import (_import_mamba_layer): when layer.mixer.conv1d exists the existing single rule is used; otherwise separate conv1d_weight and conv1d_bias rules are applied. Corresponding HF↔MCore key mappings are added to the Nemotron plugin dictionaries.

Changes

Mamba conv1d dual-layout support

Layer / File(s) Summary
Nemotron HF↔MCore conv1d key mappings
modelopt/torch/export/plugins/mcore_nemotron.py
Adds conv1d_weight and conv1d_bias entries to both nemotron_h_causal_lm_import and nemotron_h_causal_lm_export mapping dictionaries, pointing to backbone.layers.{}.mixer.conv1d.{weight,bias}.
Conditional conv1d import/export logic
modelopt/torch/export/unified_export_megatron.py, modelopt/torch/export/plugins/megatron_importer.py
_get_mamba_layer_state_dict and _import_mamba_layer each add a conditional branch: use the "conv1d" rule when layer.mixer.conv1d is present, otherwise fall back to separate "conv1d_weight" and optional "conv1d_bias" rules.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: handling MambaMixer conv1d refactor in importer/exporter. It is specific, concise, and directly reflects the core modification across multiple files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed PR contains no security anti-patterns. Changes are parameter mapping additions and conditional attribute checks for conv1d handling—no torch.load(), numpy.load(), hardcoded trust_remote_code=True,...
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aanoosheh/mamba-mixer-conv1d-compat

Comment @coderabbitai help to get the list of available commands and usage tips.

MambaMixer in some Megatron-LM branches replaced the nn.Conv1d submodule
(self.conv1d) with raw nn.Parameters (self.conv1d_weight / self.conv1d_bias).
Add fallback rules and hasattr branches in the import and export paths so
both the old and new MambaMixer APIs are handled without breaking existing
Megatron-LM versions.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
@AAnoosheh AAnoosheh force-pushed the aanoosheh/mamba-mixer-conv1d-compat branch from c0b2975 to f699894 Compare June 15, 2026 16:18
@AAnoosheh AAnoosheh self-assigned this Jun 15, 2026
@github-actions

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1730/

Built to branch gh-pages at 2026-06-15 16:24 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.50%. Comparing base (ddc0a8e) to head (f699894).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/export/plugins/megatron_importer.py 0.00% 5 Missing ⚠️
modelopt/torch/export/unified_export_megatron.py 0.00% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1730      +/-   ##
==========================================
- Coverage   77.09%   76.50%   -0.60%     
==========================================
  Files         511      511              
  Lines       56176    56701     +525     
==========================================
+ Hits        43310    43377      +67     
- Misses      12866    13324     +458     
Flag Coverage Δ
examples 41.81% <0.00%> (-0.14%) ⬇️
gpu 57.72% <0.00%> (-0.59%) ⬇️
regression 14.66% <0.00%> (+0.07%) ⬆️
unit 54.39% <0.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant