Skip to content

fix(pt/pt-expt): preserve shared tensors in change-bias#5534

Open
njzjz wants to merge 3 commits into
deepmodeling:masterfrom
njzjz:fix/pt-change-bias-shared-storage
Open

fix(pt/pt-expt): preserve shared tensors in change-bias#5534
njzjz wants to merge 3 commits into
deepmodeling:masterfrom
njzjz:fix/pt-change-bias-shared-storage

Conversation

@njzjz

@njzjz njzjz commented Jun 14, 2026

Copy link
Copy Markdown
Member

Summary

  • avoid deepcopying/replacing PyTorch checkpoint state in dp --pt change-bias
  • copy only changed tensors back into existing state dicts so shared storage aliases are preserved
  • apply the same alias-preserving checkpoint update to dp --pt-expt change-bias
  • add regression tests for multitask PT shared descriptor storage and pt_expt checkpoint aliases

Fixes #4348.

Tests

  • pytest source/tests/pt/test_change_bias.py -v
  • pytest source/tests/pt_expt/test_change_bias.py -v
  • ruff check .
  • ruff format --check .

Summary by CodeRabbit

Release Notes

  • Improvements

    • Updated PyTorch .pt change-bias checkpoint handling to apply in-place updates only to modified tensors, improving efficiency while preserving existing tensor aliases when compatible.
    • Enhanced .pt-expt change-bias behavior to better retain storage alias relationships in the produced checkpoint.
  • Tests

    • Added coverage for multitask change-bias on a specific model branch.
    • Added a test verifying checkpoint storage aliasing is preserved for .pt-expt change-bias operations.

@dosubot dosubot Bot added the bug label Jun 14, 2026
Comment on lines +215 to +217
def _find_shared_descriptor_pair(
self, state_dict: dict[str, torch.Tensor]
) -> tuple[str, str]:
@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 852737ef-ffd5-49a1-889d-23869a645ed7

📥 Commits

Reviewing files that changed from the base of the PR and between ef1eef7 and 3629adb.

📒 Files selected for processing (1)
  • source/tests/pt_expt/test_change_bias.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • source/tests/pt_expt/test_change_bias.py

📝 Walkthrough

Walkthrough

Fixes a bug in dp --pt change-bias for multitask .pt models where the old deep-copy path bloated output file size and broke single-head inference. Introduces _update_changed_model_tensors to in-place copy only changed tensors back into the original state dict (skipping _extra_state, scoped to the target branch). Parallel implementations apply the same fix to both .pt and .pt_expt checkpoint backends. Adds tests to verify shared descriptor tensor storage is preserved and only the targeted branch bias changes.

Changes

change_bias shared tensor preservation fix

Layer / File(s) Summary
PT path: _update_changed_model_tensors helper and save/load refactor
deepmd/pt/entrypoints/main.py
Removes copy import. Adds _update_changed_model_tensors to in-place copy only shape/dtype-matching tensors that differ between source and target state dicts, skipping _extra_state, with optional key_prefix to restrict updates to a multitask branch. Changes model_state_dict extraction from copy.deepcopy to .get("model", old_state_dict), updates wrapper loading accordingly, and replaces the old save reconstruction block with a single call to _update_changed_model_tensors.
PT path: TestChangeBiasMultitask multitask validation
source/tests/pt/test_change_bias.py
Adds multitask imports. Introduces TestChangeBiasMultitask with a setUp that loads water/multitask.json, preprocesses shared params, normalizes input, and saves an initial checkpoint. Adds _share_storage and _find_shared_descriptor_pair helpers. The core test runs dp --pt change-bias targeting model_1, then asserts shared descriptor tensors still alias the same storage and only the model_1 out_bias tensor is updated to the provided value. Teardown removes generated .pt files and the stat directory.
PT_EXPT path: _update_changed_model_tensors helper and save refactor
deepmd/pt_expt/entrypoints/main.py
Adds _update_changed_model_tensors helper to selectively copy tensors into an existing state dict while preserving aliases and skipping _extra_state, using detach().clone() or in-place copy_ based on shape/dtype/value equality. Updates the change_bias workflow for .pt inputs to use this helper when writing updated tensors back into model_state_dict, replacing the prior logic that overwrote the entire "model" field and restored _extra_state.
PT_EXPT path: storage alias preservation test
source/tests/pt_expt/test_change_bias.py
Adds a test that creates a checkpoint with intentionally aliased tensor storage between two descriptor tensors, runs dp --pt-expt change-bias using that checkpoint and user-specified bias, and asserts both that the output bias matches the provided values and that the aliased storage relationship between the two tensors is preserved via matching data_ptr() pointers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • deepmodeling/deepmd-kit#5334: Both PRs touch the dp --pt-expt change-bias workflow—main PR changes how updated model tensors (and aliases) are written back, while the retrieved PR adds/extends .pt2 change-bias tests for the resulting bias behavior.

Suggested reviewers

  • iProzd
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(pt/pt-expt): preserve shared tensors in change-bias' accurately describes the main objective of the pull request—preserving shared tensor storage in the change-bias operation for both PT and PT-expt implementations.
Linked Issues check ✅ Passed The PR successfully addresses issue #4348 by implementing selective tensor updates to preserve shared storage aliases in multi-head models, preventing unnecessary duplication and large file sizes.
Out of Scope Changes check ✅ Passed All changes directly support the stated objectives: the new _update_changed_model_tensors helper and its integration into change-bias workflows, plus regression tests validating the fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
source/tests/pt/test_change_bias.py (1)

234-263: ⚡ Quick win

Cover the untouched branch and the post-save load regression explicitly.

This test only proves that model_1 got the requested bias and that one shared descriptor alias survived. It would still pass if model_2’s out_bias changed as collateral damage, or if the updated checkpoint still reproduced the “must choose a head explicitly” load failure from #4348. Please snapshot model_2’s bias before run_dp(...) and add a smoke test that loads the updated .pt file through the normal inference path without forcing a head. Based on the PR objectives and review stack context, this regression needs to cover both branch isolation and the post-save load behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/pt/test_change_bias.py` around lines 234 - 263, The test
currently only verifies that model_1 received the requested bias and that shared
storage survived, but it doesn't verify that the untouched model_2 branch wasn't
inadvertently modified, nor does it test that the updated checkpoint can be
loaded through the normal inference path. Before calling run_dp(...), capture
the original bias from the peer_key (model_2's out_bias) from the original
state_dict. After loading the updated checkpoint, add an assertion that
model_2's bias matches the captured original value to ensure branch isolation
wasn't broken. Additionally, add a separate smoke test that loads the updated
checkpoint through the normal inference path without forcing a head selection to
verify the post-save load regression from `#4348` is fixed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@source/tests/pt/test_change_bias.py`:
- Around line 234-263: The test currently only verifies that model_1 received
the requested bias and that shared storage survived, but it doesn't verify that
the untouched model_2 branch wasn't inadvertently modified, nor does it test
that the updated checkpoint can be loaded through the normal inference path.
Before calling run_dp(...), capture the original bias from the peer_key
(model_2's out_bias) from the original state_dict. After loading the updated
checkpoint, add an assertion that model_2's bias matches the captured original
value to ensure branch isolation wasn't broken. Additionally, add a separate
smoke test that loads the updated checkpoint through the normal inference path
without forcing a head selection to verify the post-save load regression from
`#4348` is fixed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 60d53346-6490-4fb8-81e4-a9b095cd3bc2

📥 Commits

Reviewing files that changed from the base of the PR and between c0b0319 and 60ee486.

📒 Files selected for processing (2)
  • deepmd/pt/entrypoints/main.py
  • source/tests/pt/test_change_bias.py

@njzjz njzjz requested review from iProzd and wanghan-iapcm June 14, 2026 17:46
@njzjz njzjz changed the title fix(pt): preserve shared tensors in change-bias fix(pt/pt-expt): preserve shared tensors in change-bias Jun 14, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@source/tests/pt_expt/test_change_bias.py`:
- Around line 307-309: The `torch.as_tensor(user_bias,
dtype=updated_bias.dtype)` call defaults to CPU device when `user_bias` is a
Python list, causing device mismatch with `updated_bias` in CUDA environments.
Add a `device` parameter to the `torch.as_tensor()` call, setting it to
`updated_bias.device`, to ensure the `expected_bias` tensor is created on the
same device as `updated_bias` before comparison with
`torch.testing.assert_close()`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: e7e80516-0128-4057-a113-0a1e914914e9

📥 Commits

Reviewing files that changed from the base of the PR and between 60ee486 and ef1eef7.

📒 Files selected for processing (2)
  • deepmd/pt_expt/entrypoints/main.py
  • source/tests/pt_expt/test_change_bias.py

Comment thread source/tests/pt_expt/test_change_bias.py Outdated
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Jinzhe Zeng <njzjz@qq.com>
@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 73.68421% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.18%. Comparing base (c0b0319) to head (3629adb).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
deepmd/pt/entrypoints/main.py 76.19% 5 Missing ⚠️
deepmd/pt_expt/entrypoints/main.py 70.58% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5534   +/-   ##
=======================================
  Coverage   82.18%   82.18%           
=======================================
  Files         890      890           
  Lines      101357   101382   +25     
  Branches     4240     4240           
=======================================
+ Hits        83301    83323   +22     
- Misses      16754    16756    +2     
- Partials     1302     1303    +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] dp change-bias will give much large model

3 participants