Skip to content

spock_conflict: fix spurious tiebreaker-equal WARNING in single-writer topology#504

Open
rasifr wants to merge 3 commits into
mainfrom
fix/spoc-589/tiebreaker-warning
Open

spock_conflict: fix spurious tiebreaker-equal WARNING in single-writer topology#504
rasifr wants to merge 3 commits into
mainfrom
fix/spoc-589/tiebreaker-warning

Conversation

@rasifr

@rasifr rasifr commented Jun 16, 2026

Copy link
Copy Markdown
Member

conflict_resolve_by_timestamp() used get_node(local_origin_id) for loc_node,
where local_origin_id is the replication origin stamped on the existing local
tuple. In a single-writer topology every row on the subscriber carries
origin = provider.id and incoming changes also have remote_origin = provider.id,
so both lookups resolved to the same SpockNode. The same ambiguity arises from
a node_id hash collision: the id is a 16-bit hash of the node name, so two
distinct nodes can share the same id and again resolve to the same SpockNode.
In either case the tiebreaker comparison is trivially equal, firing a spurious
WARNING on every timestamp-tied conflict.

Use get_local_node() for loc_node so the tiebreaker always compares the
applying node against the remote sender, not the tuple's origin. Add a
same-origin shortcut: when local_origin_id == remote_origin_id the change
shares the same source (e.g. the same row updated twice in one upstream
transaction), so there is no real conflict and remote is applied directly
without consulting the tiebreaker.

Detect node_id hash collisions at node creation time.

rasifr and others added 3 commits June 16, 2026 18:39
Reproduces the bug where conflict_resolve_by_timestamp() fires a
"tiebreaker values are equal" WARNING when the same row is updated
twice in a single upstream transaction.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…r topology

conflict_resolve_by_timestamp() used get_node(local_origin_id) for loc_node,
where local_origin_id is the replication origin stamped on the existing local
tuple.  In a single-writer topology every row on the subscriber carries
origin = provider.id and incoming changes also have remote_origin = provider.id,
so both lookups resolved to the same SpockNode.  The same ambiguity arises from
a node_id hash collision: the id is a 16-bit hash of the node name, so two
distinct nodes can share the same id and again resolve to the same SpockNode.
In either case the tiebreaker comparison is trivially equal, firing a spurious
WARNING on every timestamp-tied conflict.

Use get_local_node() for loc_node so the tiebreaker always compares the
applying node against the remote sender, not the tuple's origin.  Add a
same-origin shortcut: when local_origin_id == remote_origin_id the change
shares the same source (e.g. the same row updated twice in one upstream
transaction), so there is no real conflict and remote is applied directly
without consulting the tiebreaker.

Detect node_id hash collisions at node creation time.
The collision check in create_node() now emits "collides with existing node"
instead of a PK uniqueness error.  Update the like() patterns in 024 to
match the new message.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codacy-production

codacy-production Bot commented Jun 16, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 duplication

Metric Results
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

create_node gains a pre-insertion collision guard that raises an ERROR when a computed 16-bit hash node id is already held by a differently-named node. In conflict_resolve_by_timestamp, the equal-timestamp tiebreak path is rewritten to short-circuit same-origin updates, derive the local comparison node from the applying node, and emit a structured WARNING when tiebreaker values are equal. A new TAP test and a schedule entry cover the tiebreaker warning path; the existing collision test assertions are widened to match updated error wording.

Changes

Node ID collision detection and tiebreaker conflict resolution

Layer / File(s) Summary
Node ID collision detection in create_node
src/spock_node.c, tests/tap/t/024_node_id_collision.pl
Declares existing and inserts a get_node lookup before catalog tuple insertion, raising a descriptive ERROR on id collision. Test regex patterns are widened to also accept the new "collides with existing node" phrasing in both forward and reverse subscription failure output.
Equal-timestamp tiebreak rewrite and warning test
src/spock_conflict.c, tests/tap/t/025_tiebreaker_equal_warning.pl, tests/tap/schedule
Rewrites the equal-timestamp branch to bypass tiebreaking for same-origin updates, selects the local comparison node from the applying node, and emits a structured ereport(WARNING, ...) with node names/ids and fix guidance when tiebreaker values are equal. A new 75-line TAP test verifies that two same-row updates in one transaction replicate correctly without triggering that WARNING; the schedule entry registers the new test.

Poem

A bunny once hopped through the node-id maze,
Two names, one hash — what a puzzling craze!
Now collisions get caught before tables are set,
And tiebreakers warn when values have met.
No phantom conflicts, no silent despair —
🐇 Just clean replication, handled with care!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: fixing spurious tiebreaker-equal WARNING messages in single-writer topology by adjusting conflict resolution logic.
Description check ✅ Passed The description clearly relates to the changeset, explaining the root cause (single-writer topology and hash collisions causing identical tiebreaker comparisons) and the solutions implemented.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/spoc-589/tiebreaker-warning

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/tap/t/025_tiebreaker_equal_warning.pl (1)

45-46: ⚡ Quick win

Prefer deterministic wait over fixed sleep for replication checks.

Using fixed sleeps at Line 45 and Line 60 can make this TAP test flaky under slow CI or loaded hosts. Polling until the expected replicated value appears (with timeout) is more stable.

Also applies to: 60-63

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/tap/t/025_tiebreaker_equal_warning.pl` around lines 45 - 46, Replace
the fixed sleep calls used in the tiebreaker equal warning test with
deterministic polling logic that repeatedly checks for the expected replicated
value until it appears or a timeout is reached. Instead of using system_or_bail
with 'sleep', implement a polling loop with a reasonable timeout that checks the
replication status at intervals, allowing the test to proceed as soon as the
expected condition is met rather than waiting a fixed duration. This approach
eliminates flakiness on slow or loaded CI environments.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/tap/t/025_tiebreaker_equal_warning.pl`:
- Around line 71-72: The unlike() assertion in the tiebreaker test is checking
for outdated warning text. The regex pattern is looking for the old phrase
"tiebreaker values are equal" but the current warning message emitted by
src/spock_conflict.c has been changed to "have equal tiebreaker value ...;
applying remote". Update the regex in the unlike() call at line 71 to match the
current warning text so the negative assertion properly prevents false positives
when the warning is actually emitted.

---

Nitpick comments:
In `@tests/tap/t/025_tiebreaker_equal_warning.pl`:
- Around line 45-46: Replace the fixed sleep calls used in the tiebreaker equal
warning test with deterministic polling logic that repeatedly checks for the
expected replicated value until it appears or a timeout is reached. Instead of
using system_or_bail with 'sleep', implement a polling loop with a reasonable
timeout that checks the replication status at intervals, allowing the test to
proceed as soon as the expected condition is met rather than waiting a fixed
duration. This approach eliminates flakiness on slow or loaded CI environments.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 61fb842d-1102-49ed-af0a-ffd52f3507e4

📥 Commits

Reviewing files that changed from the base of the PR and between 4cb409c and 3e40683.

📒 Files selected for processing (5)
  • src/spock_conflict.c
  • src/spock_node.c
  • tests/tap/schedule
  • tests/tap/t/024_node_id_collision.pl
  • tests/tap/t/025_tiebreaker_equal_warning.pl

Comment on lines +71 to +72
unlike($new_log, qr/tiebreaker values are equal/i,
'no spurious tiebreaker-equal WARNING');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Negative log assertion is checking stale warning text.

At Line 71, the regex only matches the old phrase (tiebreaker values are equal), but the current warning text in src/spock_conflict.c is have equal tiebreaker value ...; applying remote. This can let the test pass even when the warning is emitted.

Proposed fix
-unlike($new_log, qr/tiebreaker values are equal/i,
-       'no spurious tiebreaker-equal WARNING');
+unlike($new_log, qr/(tiebreaker values are equal|equal tiebreaker value)/i,
+       'no spurious tiebreaker-equal WARNING');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/tap/t/025_tiebreaker_equal_warning.pl` around lines 71 - 72, The
unlike() assertion in the tiebreaker test is checking for outdated warning text.
The regex pattern is looking for the old phrase "tiebreaker values are equal"
but the current warning message emitted by src/spock_conflict.c has been changed
to "have equal tiebreaker value ...; applying remote". Update the regex in the
unlike() call at line 71 to match the current warning text so the negative
assertion properly prevents false positives when the warning is actually
emitted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant