Skip to content

fix: skip storer-being-paid-now checks for replication receipts#130

Merged
jacderida merged 1 commit into
mainfrom
fix/replication-verification-context
Jun 5, 2026
Merged

fix: skip storer-being-paid-now checks for replication receipts#130
jacderida merged 1 commit into
mainfrom
fix/replication-verification-context

Conversation

@jacderida
Copy link
Copy Markdown
Collaborator

@jacderida jacderida commented Jun 5, 2026

Summary

A proof-of-payment presented during replication is a receipt for a sale that already closed, but the verifier ran the full client-PUT check set against it. Two of those checks interrogate the present and therefore guarantee false rejections for replicated records:

  • Own-quote price freshness — record counts only grow, so every receipt's quoted price eventually drops below the verifier's live floor (fix(payment): gate quote freshness on own quote only, not neighbours' #127 fixed the heterogeneous-neighbour case for client PUTs; the receipt-ageing case remains for replication).
  • Local recipient — close groups churn, so a post-churn member receiving a record via replication was never a payee on the original receipt.
  • Merkle candidate closeness (same shape) — validates the winner pool against the live DHT, but the pool was sampled from the DHT of the original sale.

On DEV-01 (2026-06-05) this rejected nearly 100% of replication PoP transfers within an hour of launch: 4M+ PoP verification error ... stale rejections at ~300k/hour, records pinned below target redundancy, close-group record counts diverging 150× (75..=11,231 per service), and a permanent ~500 MB/s fleet-wide re-offer storm (~25 TB egress in 16h). The divergence the rejections caused is also what kept the client-PUT freshness gate firing — the two failure modes fed each other.

Changes

  • New VerificationContext { ClientPut, Replication } threaded through verify_paymentverify_evm_payment / verify_merkle_payment.
  • Under Replication, skip only the storer-being-paid-now checks: own-quote freshness, local recipient, merkle candidate closeness.
  • Every receipt-authenticity check still runs in both contexts: quote structure, content binding to the exact address, peer-ID/pub-key bindings, ML-DSA signatures, on-chain settlement lookup. A record cannot be admitted via replication without an authentic, settled payment for that record.
  • The verified-XorName cache is context-aware to match (raised by Copilot review): each entry records whether its verification ran the full client-PUT check set. A Replication-verified entry satisfies later replication lookups (re-offers of the same key are routine) but never a later ClientPut fast-path; a full ClientPut verification upgrades the entry and is never downgraded.
  • Call sites: chunk PUT handler → ClientPut (behaviour unchanged); fresh-offer and paid-notify replication handlers → Replication.
  • Deliberate trade-off documented on the enum: skipping recipient/closeness for replication admits receipts from self-dealing payers who settle the median payment to their own wallet on-chain. The client-PUT path still rejects such pools, replication admission still requires responsibility for the key, and the abuse costs a settled on-chain payment per chunk; closing it properly belongs in quote issuance / payment policy, not the replication hot path, where the equivalent defence provably destroys the network's ability to heal.

Test plan

  • cargo test --lib payment::verifier — 66/66, including 5 new context tests:
    • stale own quote: rejected under ClientPut, passes the gate under Replication (fails at the later peer-binding stage, proving the skip);
    • non-recipient receipt: rejected under ClientPut, passes the gate under Replication (fails at the later signature stage);
    • content mismatch: rejected under both contexts (receipt authenticity not relaxed);
    • duplicate-candidate merkle pool: rejected by the closeness pre-check under ClientPut, proceeds past it under Replication;
    • Replication-verified cache entry: satisfies later replication lookups but never a ClientPut fast-path; upgrades on full verification, never downgrades.
  • cargo test --lib replication — 230/230; cargo test --lib storage — 29/29.
  • cargo clippy --all-targets clean; cargo fmt applied.
  • Note: config::tests::test_bootstrap_peers_discover_env_var fails on machines with a real ~/.config/ant/bootstrap_peers.toml — pre-existing on main, unrelated to this change.
  • Real-world validation: deploy a testnet from this branch and confirm replication PoP rejections stay at ~zero as record counts grow (DEV-01's collapsed at ~1 hour in), and that fleet egress settles instead of plateauing at hundreds of MB/s.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings June 5, 2026 15:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an explicit verification context to differentiate live client PUT payment verification from replication/repair receipt verification, so replication receipts aren’t rejected by “storer-being-paid-now” checks that become invalid as the network evolves (record counts grow, close-group churn, DHT changes).

Changes:

  • Add VerificationContext { ClientPut, Replication } and thread it through PaymentVerifier::verify_payment into EVM and merkle verification paths.
  • Under Replication, skip the present-tense checks (own-quote price freshness, local recipient, merkle candidate closeness) while keeping receipt authenticity + on-chain settlement checks.
  • Update PUT and replication handlers to pass the appropriate context; add unit tests validating the context-gated behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/storage/handler.rs Passes VerificationContext::ClientPut for client PUT payment verification.
src/replication/mod.rs Passes VerificationContext::Replication for replication PoP verification.
src/payment/verifier.rs Adds VerificationContext, gates specific checks by context, and adds context-specific tests.
src/payment/mod.rs Re-exports VerificationContext from the payment module.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/payment/verifier.rs
Comment thread src/replication/mod.rs Outdated
A proof-of-payment presented during replication is a receipt for a sale that
already closed, but the verifier ran the full client-PUT check set against it.
Two of those checks interrogate the present and therefore guarantee false
rejections for replicated records: the own-quote price-freshness gate (record
counts only grow, so every receipt's quoted price eventually drops below the
verifier's live floor) and the local-recipient check (close groups churn, so a
post-churn member receiving a record via replication was never a payee on the
original receipt). The merkle candidate-closeness check has the same shape: it
validates the winner pool against the live DHT, but the pool was sampled from
the DHT of the original sale.

On DEV-01 (2026-06-05) this rejected nearly 100% of replication
proof-of-payment transfers within an hour of launch: 4M+ "PoP verification
error ... stale" rejections at ~300k/hour, records pinned below target
redundancy, close-group record counts diverging 150x (75..=11,231 per
service), and a permanent ~500 MB/s fleet-wide re-offer storm (~25 TB egress
in 16h). The divergence the rejections caused is also what made the client-PUT
freshness gate (fixed for the heterogeneous-neighbour case in #127) keep
firing: the two failure modes fed each other.

Introduce VerificationContext { ClientPut, Replication } and thread it through
verify_payment. Under Replication the verifier skips only the
storer-being-paid-now checks (own-quote freshness, local recipient, merkle
candidate closeness). Every receipt-authenticity check still runs in both
contexts: quote structure, content binding to the exact address, peer-ID/
pub-key bindings, ML-DSA signatures, and the on-chain settlement lookup — a
record cannot be admitted via replication without an authentic, settled
payment for that record.

The verified-XorName cache is context-aware to match: each entry records
whether its verification ran the full client-PUT check set, a
Replication-verified entry satisfies later replication lookups (re-offers of
the same key are routine) but never a later ClientPut fast-path, and a full
ClientPut verification upgrades the entry without ever being downgraded back.
Without this, a replication receipt would let a later proof-less client PUT
bypass the context-gated checks via the cache.

Deliberate trade-off (documented on the enum): skipping the recipient and
closeness checks for replication admits receipts from self-dealing payers who
settle the median payment to their own wallet on-chain. The client-PUT path
still rejects such pools, replication admission still requires responsibility
for the key, and the abuse costs a settled on-chain payment per chunk; closing
it properly belongs in quote issuance / payment policy rather than in the
replication hot path, where the equivalent defence provably destroys the
network's ability to heal.

Call sites: the chunk PUT handler passes ClientPut (behaviour unchanged); the
fresh-offer and paid-notify replication handlers pass Replication.

Test results: payment::verifier 66/66 (5 new context tests: stale own quote
and non-recipient receipts pass the gated checks under Replication, failing at
the later binding/signature stage; content mismatch rejected under both
contexts; duplicate-candidate merkle pool rejected under ClientPut but past
the closeness check under Replication; Replication-verified cache entry does
not satisfy a ClientPut fast-path, upgrades on full verification, never
downgrades), replication 230/230, storage 29/29. cargo clippy --all-targets
clean. Note: config::tests::test_bootstrap_peers_discover_env_var fails on
machines with a real ~/.config/ant/bootstrap_peers.toml — pre-existing on
main, unrelated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jacderida jacderida force-pushed the fix/replication-verification-context branch from ecee0a6 to 5f8e81d Compare June 5, 2026 16:36
@dirvine
Copy link
Copy Markdown
Collaborator

dirvine commented Jun 5, 2026

Code review — mostly good, one payment-cache concern + minor doc nit

Thanks for the testnet forensics and the clear context write-up — this is a real
bug-fix and the trade-off discussion is honest. CI is green on all visible
checks, and the focused test suites pass locally:

  • cargo test --lib payment::verifier --no-fail-fast65/65 (incl. the 4 new context tests)
  • cargo test --lib replication --no-fail-fast229/229
  • cargo test --lib storage --no-fail-fast29/29

⚠️ Main concern — payment cache is context-blind

verify_payment in src/payment/verifier.rs ends every successful
verification with:

// Cache the verified xorname
self.cache.insert(*xorname);

That now runs for both VerificationContext::ClientPut and
VerificationContext::Replication. The Replication path can succeed while
having skipped the freshness / local-recipient / merkle-closeness checks, so a
later ClientPut for the same xorname will hit CachedAsVerified in
check_payment_required and bypass the ClientPut-only checks entirely.

Worth thinking about:

  • paid_notify calls verify_payment purely to update PaidForList; it does
    not actually store the chunk. Caching on that path populates the cache
    without the corresponding write, which compounds the above.
  • The PR description says the client-PUT behaviour is unchanged, but the
    post-replication cache hit makes "unchanged" only true for xornames the
    node has never replicated. For anything that has, the freshness /
    recipient gates stop firing for subsequent client PUTs.

One of these (or an explicit ADR note justifying the alternative):

  1. Make the cache context-aware (cache key carries the strongest context it
    has been verified under, and a ClientPut lookup only honours entries
    cached under ClientPut).
  2. Only populate the fast-path cache for ClientPut; treat Replication
    verification as admission-for-this-receipt only.
  3. Document + test the intended invariant ("a settled authentic receipt for
    this content address is sufficient forever, regardless of context") so the
    next reviewer doesn't have to re-derive it from the diff.

💡 Minor doc nit

src/replication/mod.rs:1154-1158 — the comment on the fresh-offer call site
lists the two EVM-path checks that Replication skips (own-quote freshness,
local recipient) but omits merkle candidate closeness, which is also
skipped in verify_merkle_payment for Replication. Mirroring the full set
of skipped checks here would keep the rationale accurate for both proof types.

✅ Looks good

  • Context split is clean, narrowly threaded, and the match detect_proof_type
    branch shape is preserved.
  • Receipt-authenticity checks (quote structure, content binding, peer/pubkey
    binding, ML-DSA signatures, on-chain settlement) still run in both
    contexts — that is the most important property and it's clearly preserved.
  • New tests are well-targeted: the gating is demonstrated by the error
    moving past the skipped check to a later stage, which is the right way
    to assert the skip without requiring on-chain access.
  • Documented trade-off (self-dealing payer still costs a settled on-chain
    payment per chunk, and the client-PUT path still rejects their pools) is
    the right place to leave this; agree the replication hot path is the wrong
    place to close it.

Posted by Hermes Agent.

@jacderida
Copy link
Copy Markdown
Collaborator Author

Thanks — these were already addressed in 5f8e81d (the review ran against the pre-amend revision, ecee0a6).

  • Cache concern: fixed with your option 1 — the cache is now context-aware. Entries record whether the full client-PUT check set verified them; ClientPut lookups only honour client-PUT-verified entries, Replication lookups accept either, and a full client-PUT verification upgrades an entry without ever being downgraded. The paid_notify case is covered by the same mechanism: it now inserts only a replication-flag entry, which can never satisfy a client-PUT fast-path (a proof-less PUT still demands payment) — it only short-circuits repeated replication re-verification of the same receipt, which is the desired fast-path. Covered by test_replication_verified_cache_entry_does_not_satisfy_client_put.
  • Doc nit: the fresh-offer comment now lists all three gated checks, including merkle candidate closeness.

Current branch state: payment::verifier 66/66 (5 context tests), replication 230/230, storage 29/29, clippy clean.

@jacderida jacderida merged commit 88f14ea into main Jun 5, 2026
15 checks passed
@jacderida jacderida deleted the fix/replication-verification-context branch June 5, 2026 17:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants