Fix Step3.5 HF export RoPE metadata#1745
Conversation
Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>
📝 WalkthroughWalkthroughAdds LLaMA3 RoPE configuration sanitization for Hugging Face checkpoint export. New helpers in ChangesLLaMA3 RoPE Config Sanitization on HF Export
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt/torch/export/plugins/__init__.py`:
- Around line 27-34: The fallback definition of
sanitize_hf_config_for_deployment is a complete no-op that returns None without
performing any sanitization, which silently disables the RoPE compatibility fix
when the hf_checkpoint_utils plugin import fails. Replace the no-op fallback
function with an implementation that performs the essential sanitization logic,
specifically the rope_theta injection to the config_data dictionary. This
ensures the RoPE compatibility fix is always applied regardless of whether the
optional plugin successfully imports, preventing silent runtime failures on
environments where the plugin is unavailable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5bc364af-5c34-441c-af04-b8e23600acbe
📒 Files selected for processing (4)
modelopt/torch/export/plugins/__init__.pymodelopt/torch/export/plugins/hf_checkpoint_utils.pymodelopt/torch/export/unified_export_hf.pytests/unit/torch/export/test_hf_checkpoint_utils.py
| with import_plugin("hf_checkpoint_utils"): | ||
| from .hf_checkpoint_utils import * | ||
|
|
||
| if "sanitize_hf_config_for_deployment" not in globals(): | ||
|
|
||
| def sanitize_hf_config_for_deployment(config_data: dict[str, Any], model: Any) -> None: | ||
| """No-op fallback when Hugging Face checkpoint utilities are unavailable.""" | ||
| return None |
There was a problem hiding this comment.
Silent no-op fallback can disable the RoPE compatibility fix at runtime.
When hf_checkpoint_utils fails to import, this fallback makes sanitize_hf_config_for_deployment a no-op, so export continues without rope_theta injection. That silently reintroduces the config incompatibility on environments where the optional plugin import fails.
Suggested fix (keep sanitization always available)
# modelopt/torch/export/plugins/__init__.py
with import_plugin("hf_checkpoint_utils"):
from .hf_checkpoint_utils import *
-if "sanitize_hf_config_for_deployment" not in globals():
-
- def sanitize_hf_config_for_deployment(config_data: dict[str, Any], model: Any) -> None:
- """No-op fallback when Hugging Face checkpoint utilities are unavailable."""
- return None
+if "sanitize_hf_config_for_deployment" not in globals():
+ raise ImportError(
+ "sanitize_hf_config_for_deployment is required for HF export compatibility but "
+ "could not be imported from hf_checkpoint_utils."
+ )# modelopt/torch/export/plugins/hf_checkpoint_utils.py
-from huggingface_hub import snapshot_download
-from huggingface_hub.errors import LocalEntryNotFoundError
...
def copy_hf_ckpt_remote_code(
pretrained_model_path: str | os.PathLike, save_directory: str | os.PathLike
):
+ # Local import keeps unrelated helpers (e.g., config sanitization) available
+ # even when huggingface_hub is not installed.
+ from huggingface_hub import snapshot_download
+ from huggingface_hub.errors import LocalEntryNotFoundError🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modelopt/torch/export/plugins/__init__.py` around lines 27 - 34, The fallback
definition of sanitize_hf_config_for_deployment is a complete no-op that returns
None without performing any sanitization, which silently disables the RoPE
compatibility fix when the hf_checkpoint_utils plugin import fails. Replace the
no-op fallback function with an implementation that performs the essential
sanitization logic, specifically the rope_theta injection to the config_data
dictionary. This ensures the RoPE compatibility fix is always applied regardless
of whether the optional plugin successfully imports, preventing silent runtime
failures on environments where the plugin is unavailable.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1745 +/- ##
===========================================
+ Coverage 58.45% 76.53% +18.07%
===========================================
Files 510 511 +1
Lines 56271 56366 +95
===========================================
+ Hits 32896 43138 +10242
+ Misses 23375 13228 -10147
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change: Bug fix
Fixes exported Step-3.5 Hugging Face checkpoints for Transformers 5.x/vLLM by adding
rope_thetatollama3RoPE metadata whensave_pretrained()emitsrope_parametersorrope_scalingwithout that required key.Transformers 5.10.2 validates
rope_parametersduring config construction and raises:The sanitizer uses
config.json["rope_theta"]when present, otherwise falls back tomodel.config.rope_theta.Usage
Export Step-3.5 HF checkpoints as usual with
hf_ptq.py, then serve the exported checkpoint with vLLM/Transformers 5.x.Testing
pre-commit run ruff-format --files modelopt/torch/export/plugins/hf_checkpoint_utils.py modelopt/torch/export/plugins/__init__.py modelopt/torch/export/unified_export_hf.py tests/unit/torch/export/test_hf_checkpoint_utils.pypre-commit run ruff-check --files modelopt/torch/export/plugins/hf_checkpoint_utils.py modelopt/torch/export/plugins/__init__.py modelopt/torch/export/unified_export_hf.py tests/unit/torch/export/test_hf_checkpoint_utils.pypre-commit run mypy --files modelopt/torch/export/plugins/hf_checkpoint_utils.py modelopt/torch/export/plugins/__init__.py modelopt/torch/export/unified_export_hf.py tests/unit/torch/export/test_hf_checkpoint_utils.pypre-commit run mypy --all-filesPYTHONPATH=/private/tmp/modelopt-step35-rope-fix python -m pytest tests/unit/torch/export/test_hf_checkpoint_utils.py -vvMissing required keys ... {'rope_theta'}errorrope_parameters["rope_theta"] == 500000Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/AAdditional Information
This is separate from PR #1693. PR #1693 handles
layer_typeslength sanitization; this PR handles the earlier Step-3.5 RoPE config validation failure.Summary by CodeRabbit
New Features
Tests