fix(pt/pt-expt): preserve shared tensors in change-bias#5534
Conversation
| def _find_shared_descriptor_pair( | ||
| self, state_dict: dict[str, torch.Tensor] | ||
| ) -> tuple[str, str]: |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughFixes a bug in Changeschange_bias shared tensor preservation fix
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
source/tests/pt/test_change_bias.py (1)
234-263: ⚡ Quick winCover the untouched branch and the post-save load regression explicitly.
This test only proves that
model_1got the requested bias and that one shared descriptor alias survived. It would still pass ifmodel_2’sout_biaschanged as collateral damage, or if the updated checkpoint still reproduced the “must choose a head explicitly” load failure from#4348. Please snapshotmodel_2’s bias beforerun_dp(...)and add a smoke test that loads the updated.ptfile through the normal inference path without forcing a head. Based on the PR objectives and review stack context, this regression needs to cover both branch isolation and the post-save load behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@source/tests/pt/test_change_bias.py` around lines 234 - 263, The test currently only verifies that model_1 received the requested bias and that shared storage survived, but it doesn't verify that the untouched model_2 branch wasn't inadvertently modified, nor does it test that the updated checkpoint can be loaded through the normal inference path. Before calling run_dp(...), capture the original bias from the peer_key (model_2's out_bias) from the original state_dict. After loading the updated checkpoint, add an assertion that model_2's bias matches the captured original value to ensure branch isolation wasn't broken. Additionally, add a separate smoke test that loads the updated checkpoint through the normal inference path without forcing a head selection to verify the post-save load regression from `#4348` is fixed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@source/tests/pt/test_change_bias.py`:
- Around line 234-263: The test currently only verifies that model_1 received
the requested bias and that shared storage survived, but it doesn't verify that
the untouched model_2 branch wasn't inadvertently modified, nor does it test
that the updated checkpoint can be loaded through the normal inference path.
Before calling run_dp(...), capture the original bias from the peer_key
(model_2's out_bias) from the original state_dict. After loading the updated
checkpoint, add an assertion that model_2's bias matches the captured original
value to ensure branch isolation wasn't broken. Additionally, add a separate
smoke test that loads the updated checkpoint through the normal inference path
without forcing a head selection to verify the post-save load regression from
`#4348` is fixed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 60d53346-6490-4fb8-81e4-a9b095cd3bc2
📒 Files selected for processing (2)
deepmd/pt/entrypoints/main.pysource/tests/pt/test_change_bias.py
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@source/tests/pt_expt/test_change_bias.py`:
- Around line 307-309: The `torch.as_tensor(user_bias,
dtype=updated_bias.dtype)` call defaults to CPU device when `user_bias` is a
Python list, causing device mismatch with `updated_bias` in CUDA environments.
Add a `device` parameter to the `torch.as_tensor()` call, setting it to
`updated_bias.device`, to ensure the `expected_bias` tensor is created on the
same device as `updated_bias` before comparison with
`torch.testing.assert_close()`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: e7e80516-0128-4057-a113-0a1e914914e9
📒 Files selected for processing (2)
deepmd/pt_expt/entrypoints/main.pysource/tests/pt_expt/test_change_bias.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Jinzhe Zeng <njzjz@qq.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5534 +/- ##
=======================================
Coverage 82.18% 82.18%
=======================================
Files 890 890
Lines 101357 101382 +25
Branches 4240 4240
=======================================
+ Hits 83301 83323 +22
- Misses 16754 16756 +2
- Partials 1302 1303 +1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Summary
dp --pt change-biasdp --pt-expt change-biasFixes #4348.
Tests
pytest source/tests/pt/test_change_bias.py -vpytest source/tests/pt_expt/test_change_bias.py -vruff check .ruff format --check .Summary by CodeRabbit
Release Notes
Improvements
.ptchange-bias checkpoint handling to apply in-place updates only to modified tensors, improving efficiency while preserving existing tensor aliases when compatible..pt-exptchange-bias behavior to better retain storage alias relationships in the produced checkpoint.Tests
.pt-exptchange-bias operations.