fix(megatron): handle MambaMixer conv1d refactor in importer/exporter by AAnoosheh · Pull Request #1730 · NVIDIA/Model-Optimizer

AAnoosheh · 2026-06-15T16:17:05Z

Summary

MambaMixer in some Megatron-LM branches replaced the nn.Conv1d submodule (self.conv1d) with raw nn.Parameters (self.conv1d_weight / self.conv1d_bias)
Add hasattr(layer.mixer, "conv1d") fallback in both the import path (megatron_importer.py) and export path (unified_export_megatron.py)
Add corresponding conv1d_weight / conv1d_bias rules to mcore_nemotron.py for both import and export mappings

Backward compatible: old Megatron-LM versions with nn.Conv1d hit the existing if branch unchanged. The else branch only fires when conv1d is absent.

Test plan

Verified quantize step completes end-to-end on Nemotron-3-Nano-30B-A3B with refactored MambaMixer branch
Old Megatron-LM versions unaffected (fallback via hasattr)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved Mamba convolution parameter handling during model import and export, including support for alternative mixer layouts.
- Added explicit Hugging Face ↔ Megatron Core mappings for Nemotron Mamba conv1d_weight and conv1d_bias, ensuring consistent state-dict key generation across import/export.

coderabbitai · 2026-06-15T16:17:22Z

📝 Walkthrough

Walkthrough

Adds dual-layout support for Mamba layer conv1d parameters in both export (_get_mamba_layer_state_dict) and import (_import_mamba_layer): when layer.mixer.conv1d exists the existing single rule is used; otherwise separate conv1d_weight and conv1d_bias rules are applied. Corresponding HF↔MCore key mappings are added to the Nemotron plugin dictionaries.

Changes

Mamba conv1d dual-layout support

Layer / File(s)	Summary
Nemotron HF↔MCore conv1d key mappings `modelopt/torch/export/plugins/mcore_nemotron.py`	Adds `conv1d_weight` and `conv1d_bias` entries to both `nemotron_h_causal_lm_import` and `nemotron_h_causal_lm_export` mapping dictionaries, pointing to `backbone.layers.{}.mixer.conv1d.{weight,bias}`.
Conditional conv1d import/export logic `modelopt/torch/export/unified_export_megatron.py`, `modelopt/torch/export/plugins/megatron_importer.py`	`_get_mamba_layer_state_dict` and `_import_mamba_layer` each add a conditional branch: use the `"conv1d"` rule when `layer.mixer.conv1d` is present, otherwise fall back to separate `"conv1d_weight"` and optional `"conv1d_bias"` rules.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: handling MambaMixer conv1d refactor in importer/exporter. It is specific, concise, and directly reflects the core modification across multiple files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	PR contains no security anti-patterns. Changes are parameter mapping additions and conditional attribute checks for conv1d handling—no torch.load(), numpy.load(), hardcoded trust_remote_code=True,...
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch aanoosheh/mamba-mixer-conv1d-compat

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

MambaMixer in some Megatron-LM branches replaced the nn.Conv1d submodule (self.conv1d) with raw nn.Parameters (self.conv1d_weight / self.conv1d_bias). Add fallback rules and hasattr branches in the import and export paths so both the old and new MambaMixer APIs are handled without breaking existing Megatron-LM versions. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>

github-actions · 2026-06-15T16:24:40Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1730/
Built to branch `gh-pages` at 2026-06-15 16:24 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-06-15T16:29:37Z

Codecov Report

❌ Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.50%. Comparing base (ddc0a8e) to head (f699894).
⚠️ Report is 8 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/export/plugins/megatron_importer.py	0.00%	5 Missing ⚠️
modelopt/torch/export/unified_export_megatron.py	0.00%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1730      +/-   ##
==========================================
- Coverage   77.09%   76.50%   -0.60%     
==========================================
  Files         511      511              
  Lines       56176    56701     +525     
==========================================
+ Hits        43310    43377      +67     
- Misses      12866    13324     +458

Flag	Coverage Δ
examples	`41.81% <0.00%> (-0.14%)`	⬇️
gpu	`57.72% <0.00%> (-0.59%)`	⬇️
regression	`14.66% <0.00%> (+0.07%)`	⬆️
unit	`54.39% <0.00%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

AAnoosheh requested a review from a team as a code owner June 15, 2026 16:17

AAnoosheh requested a review from cjluo-nv June 15, 2026 16:17

AAnoosheh force-pushed the aanoosheh/mamba-mixer-conv1d-compat branch from c0b2975 to f699894 Compare June 15, 2026 16:18

AAnoosheh self-assigned this Jun 15, 2026

coderabbitai Bot approved these changes Jun 15, 2026

View reviewed changes

AAnoosheh mentioned this pull request Jun 15, 2026

Support Mamba direct conv1d quant export #1716

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(megatron): handle MambaMixer conv1d refactor in importer/exporter#1730

fix(megatron): handle MambaMixer conv1d refactor in importer/exporter#1730
AAnoosheh wants to merge 1 commit into
mainfrom
aanoosheh/mamba-mixer-conv1d-compat

AAnoosheh commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 15, 2026

Built to branch `gh-pages` at 2026-06-15 16:24 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AAnoosheh commented Jun 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 15, 2026

Built to branch gh-pages at 2026-06-15 16:24 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AAnoosheh commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-15 16:24 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Jun 15, 2026 •

edited

Loading