Skip to content

fix(partitions): persist data on vsr backups, add data integrity test#3512

Open
hubcio wants to merge 1 commit into
masterfrom
data-integrity-vsr
Open

fix(partitions): persist data on vsr backups, add data integrity test#3512
hubcio wants to merge 1 commit into
masterfrom
data-integrity-vsr

Conversation

@hubcio

@hubcio hubcio commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

VSR's hash chain and checksum-keyed recovery require a committed op's
on-disk bytes to match on every replica. Two defects broke that: the
partition commit path drained the pipeline, which only the primary
fills, so backups journaled replicated prepares but never flushed them
(0-byte segments); and append re-stamped base_timestamp from a local
now(), diverging bytes even once persisted.

commit_journal now falls back to the journal when the pipeline is
empty, so backups persist like the metadata plane. base_timestamp
reuses the prepare's monotonic timestamp, stamped once by the primary
and replicated. The 3-node data integrity test is un-ignored and gates
this.

@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label Jun 19, 2026
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 62.09386% with 105 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.64%. Comparing base (4a48008) to head (733633a).

Files with missing lines Patch % Lines
core/partitions/src/iggy_partition.rs 18.29% 66 Missing and 1 partial ⚠️
core/partitions/src/journal.rs 73.10% 32 Missing ⚠️
core/consensus/src/impls.rs 72.72% 3 Missing ⚠️
core/partitions/src/messages_writer.rs 0.00% 3 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3512       +/-   ##
=============================================
- Coverage     74.27%   50.64%   -23.63%     
  Complexity      937      937               
=============================================
  Files          1259     1256        -3     
  Lines        125969   112478    -13491     
  Branches     101644    88194    -13450     
=============================================
- Hits          93558    56967    -36591     
- Misses        29396    52654    +23258     
+ Partials       3015     2857      -158     
Components Coverage Δ
Rust Core 44.86% <62.09%> (-30.30%) ⬇️
Java SDK 58.57% <ø> (ø)
C# SDK 71.40% <ø> (-0.71%) ⬇️
Python SDK 88.88% <ø> (ø)
PHP SDK 84.29% <ø> (ø)
Node SDK 91.13% <ø> (-0.10%) ⬇️
Go SDK 40.36% <ø> (ø)
Files with missing lines Coverage Δ
core/consensus/src/plane_helpers.rs 74.09% <100.00%> (-17.37%) ⬇️
core/journal/src/prepare_journal.rs 61.68% <ø> (-24.97%) ⬇️
core/metadata/src/impls/metadata.rs 37.50% <ø> (-1.41%) ⬇️
core/shard/src/lib.rs 72.87% <100.00%> (-1.07%) ⬇️
core/consensus/src/impls.rs 74.92% <72.72%> (-4.32%) ⬇️
core/partitions/src/messages_writer.rs 26.50% <0.00%> (-1.00%) ⬇️
core/partitions/src/journal.rs 45.62% <73.10%> (+13.14%) ⬆️
core/partitions/src/iggy_partition.rs 36.22% <18.29%> (-5.96%) ⬇️

... and 431 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@hubcio hubcio force-pushed the data-integrity-vsr branch 6 times, most recently from aff604f to e87f76a Compare June 22, 2026 06:33
VSR's hash chain and checksum-keyed recovery require a committed
op's on-disk bytes to match on every replica. Backups persisted
0-byte segments: the partition commit path drained the in-memory
pipeline, which only the primary fills. And the append path
stamped base_timestamp from a local now(), diverging bytes per
node even once persisted.

Backups now source committable ops from the journal when the
pipeline is empty, and commit_messages flushes only the committed
prefix (op <= commit_max), keeping the uncommitted tail resident.
A backup thus never writes uncommitted bytes to its segment and
never drops the headers a later commit needs - which would
otherwise wedge commit_min below commit_max. base_timestamp
reuses the prepare's monotonic timestamp, stamped once by the
primary and replicated. A 3-node data-integrity test gates the
cross-replica byte-identity.
@hubcio hubcio force-pushed the data-integrity-vsr branch from e87f76a to 733633a Compare June 22, 2026 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-review PR is waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant