[Feat]: Domino support by h-guo18 · Pull Request #1710 · NVIDIA/Model-Optimizer

h-guo18 · 2026-06-13T22:04:33Z

What does this PR do?

Type of change: New feature

Adds Domino speculative decoding: the parallel DFlash draft backbone plus a lightweight GRU causal correction head. The backbone produces base logits for a full draft block in one forward; a GRU over the block's teacher-forced tokens produces a causal state that is fused with the backbone hidden state and projected to a vocab-sized logit correction on the block suffix — injecting the intra-block causal dependency the parallel backbone lacks. Trained with a dual loss (1-λ)*final + λ*base, where λ_base decays linearly 1→0 (curriculum: learn the parallel backbone first, then the correction).

Reuses the DFlash mode/config/recipe; selected via dflash_architecture_config.projector_type=domino and routed to its own registry so HFDominoModel does not shadow HFDFlashModel. Exports in the z-lab/SpecForge drafter format (prefix_gru.* / embed_proj.*).

Note: the inference side (vLLM / AR evaluation) is intentionally not wired up yet — the correction head is not applied in serving. To be added once the inference path lands.

Usage

# Online training (recipe: projector_type=domino)
uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_online_domino.yaml --yes

Testing

CPU unit tests in tests/unit/torch/speculative/plugins/test_hf_domino.py cover conversion routing, the training forward (dual loss + grads), the λ schedule, and the export format. Online Qwen3-8B training validated end-to-end (loss curve below).

Before your PR is "Ready for review"

Is this change backward compatible?: ✅ (opt-in via projector_type=domino; DFlash path unchanged)
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A (no new dependency)
Did you write any new necessary tests?: ✅
Did you update Changelog?: ✅
Did you get Claude approval on this PR?: ❌

Additional Information

Reference: SpecForge PR #571 (z-lab); drafter format huggingface.co/Huang2020/Qwen3-8B-Domino-b16.

Summary by CodeRabbit

Release Notes

New Features
- Added Domino speculative-decoding training with a curriculum-based loss scaling schedule.
- Extended speculative model export to include Domino-specific draft head configuration.
Documentation & Configuration
- Added a Domino speculative-decoding training recipe.
- Added an HF Online Domino launcher configuration for Qwen3-8B.
Tests
- Added CPU unit tests for Domino conversion, training loss behavior, lambda scheduling, and Domino export outputs.

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

copy-pr-bot · 2026-06-13T22:04:36Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-13T22:04:40Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0aecc1ee-ad66-4101-9890-443e63e322f7

📥 Commits

Reviewing files that changed from the base of the PR and between ba737bc and 35437c5.

📒 Files selected for processing (3)

modelopt/torch/export/plugins/hf_spec_export.py
modelopt/torch/speculative/plugins/hf_domino.py
modelopt_recipes/general/speculative_decoding/domino.yaml

🚧 Files skipped from review as they are similar to previous changes (3)

modelopt_recipes/general/speculative_decoding/domino.yaml
modelopt/torch/export/plugins/hf_spec_export.py
modelopt/torch/speculative/plugins/hf_domino.py

📝 Walkthrough

Walkthrough

Adds a "Domino" speculative decoding training variant on top of DFlash. New components include a DominoModule with a GRU causal correction head, an HFDominoModel plugin with dual cross-entropy loss and lambda curriculum scheduling, a DominoDMRegistry for routing conversion, a DominoExporter for config export, and supporting recipe/launcher YAML files. Inference is not yet wired.

Changes

Domino Speculative Decoding Training

Layer / File(s)	Summary
Config fields and conversion registry routing `modelopt/torch/speculative/config.py`, `modelopt/torch/speculative/dflash/conversion.py`, `modelopt/torch/speculative/plugins/__init__.py`	Adds `dflash_lambda_base_start` and `dflash_lambda_base_decay_ratio` Pydantic fields to `DFlashConfig`. Introduces `DominoDMRegistry` and updates `convert_to_dflash_model` to select it when `projector_type == "domino"`. Re-exports `hf_domino` symbols from the plugins package.
HFDFlashModel extensibility hook `modelopt/torch/speculative/plugins/hf_dflash.py`	Refactors `HFDFlashModel.modify()` to use an overridable `_build_draft_module` factory hook instead of directly constructing `DFlashModule`, enabling subclasses to build augmented draft modules while reusing setup.
DominoModule definition `modelopt/torch/speculative/plugins/modeling_domino.py`	Introduces `DominoModule(DFlashModule)` with a bias-free `prefix_gru` GRU and an `embed_proj` MLP head for vocab-sized logit corrections, initialized via `_init_head_weights` using a normal distribution sampling.
HFDominoModel training plugin and DominoLambdaCallback `modelopt/torch/speculative/plugins/hf_domino.py`	Adds `compute_lambda_base` for linear decay scheduling. Implements `HFDominoModel` with `_apply_domino_head` (per-block GRU + suffix logit correction), `_compute_domino_loss` (dual weighted cross-entropy with exponential decay and accuracy metrics), and `forward` (anchor sampling, backbone+head execution). Adds `DominoLambdaCallback` to update `_lambda_base` each trainer step.
DominoExporter and training entry-point callback `modelopt/torch/export/plugins/hf_spec_export.py`, `examples/speculative_decoding/main.py`	Adds `DominoExporter(DFlashExporter)` overriding `_export_config` to inject `emb_dim` and GRU/projector fields into `config.json`. Registers `DominoLambdaCallback` in the training script when `projector_type == "domino"`.
Unit tests for conversion, forward, schedule, and export `tests/unit/torch/speculative/plugins/test_hf_domino.py`	Covers conversion routing to `HFDominoModel`/`DominoModule`, GRU structure and head dimensions, forward-pass dual loss with gradient flow, `lambda_base = 0` behavior, `compute_lambda_base` linear decay, and safetensors/config.json export layout.
Domino recipe config, launcher pipeline, and changelog `modelopt_recipes/general/speculative_decoding/domino.yaml`, `tools/launcher/examples/Qwen/Qwen3-8B/hf_online_domino.yaml`, `CHANGELOG.rst`	Adds `domino.yaml` training recipe with `projector_type: domino`, GRU/draft dimensions, and lambda curriculum fields. Adds a two-task Qwen3-8B SLURM launcher pipeline. Adds a CHANGELOG entry noting training support and pending inference wiring.

Sequence Diagram(s)

sequenceDiagram
  participant Trainer
  participant DominoLambdaCallback
  participant HFDominoModel
  participant DraftBackbone
  participant DominoModule

  Trainer->>DominoLambdaCallback: on_step_begin(global_step)
  DominoLambdaCallback->>HFDominoModel: _lambda_base = compute_lambda_base(global_step, total_steps, ...)
  Trainer->>HFDominoModel: forward(input_ids, labels, loss_mask)
  HFDominoModel->>HFDominoModel: anchor sampling + loss_mask build
  HFDominoModel->>DraftBackbone: run backbone (no-grad base hidden states)
  DraftBackbone-->>HFDominoModel: hidden_states, base_logits
  HFDominoModel->>DominoModule: _apply_domino_head(hidden_states, base_logits, anchors)
  DominoModule-->>HFDominoModel: corrected final_logits
  HFDominoModel->>HFDominoModel: _compute_domino_loss(final_logits, base_logits, _lambda_base)
  HFDominoModel-->>Trainer: ModelOutput(loss, base_loss, final_loss, base_accuracy, lambda_base)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

yeyu-nvidia
benchislett
ChenhanYu

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title '[Feat]: Domino support' is partially related to the changeset—it refers to a real part of the change (Domino feature addition) but is overly broad and vague, lacking specificity about what Domino is or what aspect is being added.	Consider a more specific title like '[Feat]: Add Domino speculative decoding with GRU causal correction head' or '[Feat]: Implement Domino variant with dual-loss training curriculum' to better convey the primary change.

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 82.14% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns	✅ Passed	All Python files pass security audit against SECURITY.md guidelines. No unsafe torch.load, numpy.load, hardcoded trust_remote_code, eval/exec, nosec comments, or new dependencies detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch haoguo/domino

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-13T22:08:03Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1710/
Built to branch `gh-pages` at 2026-06-15 21:56 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-06-13T22:13:46Z

Codecov Report

❌ Patch coverage is 84.31373% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.00%. Comparing base (9f37fe1) to head (35437c5).
⚠️ Report is 12 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/speculative/plugins/hf_domino.py	79.73%	31 Missing ⚠️
...elopt/torch/speculative/plugins/modeling_domino.py	95.65%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1710      +/-   ##
==========================================
- Coverage   77.12%   77.00%   -0.13%     
==========================================
  Files         511      513       +2     
  Lines       56236    56614     +378     
==========================================
+ Hits        43374    43596     +222     
- Misses      12862    13018     +156

Flag	Coverage Δ
unit	`54.66% <84.31%> (+0.27%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

copy-pr-bot · 2026-06-15T18:53:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

h-guo18 · 2026-06-15T19:02:03Z

/claude review

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/speculative_decoding/main.py`:
- Line 285: The import of DominoLambdaCallback from
modelopt.torch.speculative.plugins.hf_domino at line 285 is currently inside a
function, which violates the repository's import-placement guidelines. Move this
import to the module scope at the top of the file with other imports, unless
there is a specific reason for the in-function placement (such as circular
dependency, optional dependency, or performance concerns). If such a reason
exists, keep the in-function import but add a brief comment above it explaining
the concrete justification for the non-standard placement.

In `@modelopt/torch/speculative/config.py`:
- Around line 135-150: Add schema bounds validation to the
dflash_lambda_base_start and dflash_lambda_base_decay_ratio ModeloptField
definitions to enforce that these normalized weight/fraction fields accept only
values in the valid range (0 to 1). This will cause invalid configuration values
to be rejected at config load time rather than being silently masked downstream,
following the coding guideline to validate external input at the interface
boundary.

In `@modelopt/torch/speculative/dflash/conversion.py`:
- Around line 44-49: The registry selection logic currently silently defaults to
DFlashDMRegistry for any unknown projector_type value, which can hide typos and
route users incorrectly. Replace the conditional expression with explicit
validation that checks if the projector_type is one of the supported values
("domino" or the default). Raise an appropriate error (e.g., ValueError) if the
projector_type is unsupported, ensuring invalid input is rejected at the
interface boundary rather than silently falling back to a default registry.

In `@modelopt/torch/speculative/plugins/modeling_domino.py`:
- Around line 58-59: Add validation for the pure_draft_prefix_len attribute
immediately after it is read from config in the initialization block. The
validation should check that pure_draft_prefix_len is non-negative and strictly
less than the block_size to ensure suffix correction works properly. If the
validation fails, raise a clear ValueError with a descriptive message indicating
the valid range requirement. This validation should occur at the config
interface boundary during module initialization, right after line 58 where
pure_draft_prefix_len is assigned from the config getattr call.

In `@tests/unit/torch/speculative/plugins/test_hf_domino.py`:
- Line 99: Move the in-function import of DFlashModule from line 99 and the
corresponding import at line 184 to the top-level of the test module (at the
beginning of the file with other imports). These imports do not have explicit
circular dependency, optional dependency, or heavy-import justifications that
would warrant keeping them in-function, so they should follow test conventions
by being at module-level to catch import errors during test collection rather
than execution.

In `@tools/launcher/examples/Qwen/Qwen3-8B/hf_online_domino.yaml`:
- Around line 72-75: The environment section in the YAML configuration file for
this new Qwen3-8B model config is missing two required environment variables. In
the environment list at lines 72-75 (which currently contains only
MAX_FINAL_LOSS and MIN_FINAL_ACC), add the two required launcher environment
variables MLM_MODEL_CFG and QUANT_CFG as additional list items, following the
same format as the existing environment variables. These must be explicitly set
according to the launcher coding guidelines for new model configurations.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: afea7d4d-38ed-4205-b10b-8d8ee062ad26

📥 Commits

Reviewing files that changed from the base of the PR and between 9f37fe1 and 9d904c3.

📒 Files selected for processing (12)

CHANGELOG.rst
examples/speculative_decoding/main.py
modelopt/torch/export/plugins/hf_spec_export.py
modelopt/torch/speculative/config.py
modelopt/torch/speculative/dflash/conversion.py
modelopt/torch/speculative/plugins/__init__.py
modelopt/torch/speculative/plugins/hf_dflash.py
modelopt/torch/speculative/plugins/hf_domino.py
modelopt/torch/speculative/plugins/modeling_domino.py
modelopt_recipes/general/speculative_decoding/domino.yaml
tests/unit/torch/speculative/plugins/test_hf_domino.py
tools/launcher/examples/Qwen/Qwen3-8B/hf_online_domino.yaml

claude

Claude review

Reviewed the Domino training-only addition end-to-end. The algorithm is correct: GRU teacher-forcing on input_ids[anchor..anchor+bs-1] with shift_label=True predicts anchor+k+1 from anchor..anchor+k (no leakage), the suffix slice + base-logit add matches the SpecForge formulation, and the dual loss / λ-curriculum are wired correctly. The DominoDMRegistry split keeps HFDominoModel from shadowing HFDFlashModel, and config / state-dict round-tripping looks safe (new fields have defaults; old saved configs without projector_type keep routing to the DFlash registry).

Findings

CRITICAL: 0
IMPORTANT: 1
- Silent eval bypass: in non-training mode forward delegates to HFDFlashModel.forward, so pseudo_speculative_generate / AR validation never applies the trained Domino head. Acknowledged in the PR description, but an estimate_ar/ar_validate_steps user would silently get backbone-only acceptance numbers. Suggest a logger.warning_once here and a short note in domino.yaml.
SUGGESTION: 3
- DominoExporter._export_config uses getattr(draft_config, "emb_dim") (no default) — fails at export time after a long train if the user's dflash_architecture_config omits a head field. Prefer validating in HFDominoModel.modify.
- DominoLambdaCallback falls back to total_steps=1 when state.max_steps is unset, which silently flips λ_base to 0 from step 1. Add a one-shot warning when the fallback is taken.
- With lambda_base == 1.0 the head params drop out of the autograd graph; the recipe correctly sets ddp_find_unused_parameters: true but the dependency is invisible — worth a one-line note next to that flag in domino.yaml.

Overall risk

Low. Training-only path, opt-in via projector_type=domino, no behavior change for existing DFlash users. The findings are about ergonomics / fail-loud-vs-fail-silent rather than algorithm correctness.

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

ChenhanYu · 2026-06-15T23:39:16Z

The results look good. The the final checkpoint the same as DFlash? Or require additional vLLM support?

h-guo18 · 2026-06-15T23:41:58Z

The results look good. The the final checkpoint the same as DFlash? Or require additional vLLM support?

The final checkpoint contains new components other than typical DFlash. vLLM/TRTLLM is not yet ready in their main branches. We can add serve/specdecbench support once it's available.

add domino support

0bf466b

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 changed the title ~~add domino support~~ [Feat]: Domino support Jun 13, 2026

add domino support

f0a1a99

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 marked this pull request as ready for review June 15, 2026 18:52

h-guo18 requested review from a team as code owners June 15, 2026 18:52

h-guo18 requested review from kevalmorabia97 and meenchen June 15, 2026 18:52

Add Domino changelog entry under 0.46

9d904c3

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 requested a review from ChenhanYu June 15, 2026 18:53

coderabbitai Bot reviewed Jun 15, 2026

View reviewed changes

claude Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread modelopt/torch/speculative/plugins/hf_domino.py

claude Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread modelopt/torch/export/plugins/hf_spec_export.py Outdated

claude Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread modelopt/torch/speculative/plugins/hf_domino.py Outdated

claude Bot reviewed Jun 15, 2026

View reviewed changes

Comment thread modelopt/torch/speculative/plugins/hf_domino.py

coderabbit comments

ba737bc

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

h-guo18 requested a review from a team as a code owner June 15, 2026 21:40

coderabbitai Bot approved these changes Jun 15, 2026

View reviewed changes

h-guo18 self-assigned this Jun 15, 2026

address comments

35437c5

Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>

Conversation

h-guo18 commented Jun 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot Bot commented Jun 13, 2026

Uh oh!

coderabbitai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

github-actions Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-06-15 21:56 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

copy-pr-bot Bot commented Jun 15, 2026

Uh oh!

h-guo18 commented Jun 15, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude review

Findings

Overall risk

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ChenhanYu commented Jun 15, 2026

Uh oh!

h-guo18 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

h-guo18 commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

github-actions Bot commented Jun 13, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-15 21:56 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented Jun 13, 2026 •

edited

Loading