feat(retrieval): confidence-gate + dedup page-based citations by hallelx2 · Pull Request #32 · hallelx2/vectorless-engine

hallelx2 · 2026-05-29T16:58:48Z

The signal (FinanceBench, n=8)

The page-based strategy (/v1/answer/pageindex) shows a clean, exploitable split:

When it commits to a single citation -> f1=1.0, hit=1.0 (id_00807, id_00941).
When unsure it sprays ~5 page ranges -> all miss (f1=0), and on one miss returned the same section id five times (id_00499: sec_363... x5), deflating precision.

The retrieval mechanism works; the loss is confidence calibration + dedup.

What changed

1. Dedup (correctness). Every terminal cited-range set now flows through a single selectCitedRanges chokepoint that clamps + dedups by range. A range cited 5x collapses to one; a section id never repeats in SelectedIDs. The sec_363... x5 case becomes one citation.

2. Confidence gating. The done action takes an optional confidence (0-1), parsed from a top-level field or a rich cited_pages object form ({"pages":[5,7],"confidence":0.9}). Surfaced on Result.Confidence, Result.Confidences, the done event, and the API response. A low confidence does not suppress the answer — the agent still returns its single best pick.

3. Cap. When done cites more than retrieval.pageindex.max_citations (default 3) distinct ranges, keep the top-N by confidence (ties -> emission order). Mechanical backstop even if the prompt doesn't fully tame the spray. max_hops unchanged — this bounds the FINAL citation set, not navigation.

4. Tight citations[]. The response's citations[] is now built from the final cited ranges (deduped + capped) instead of every page read, so a confident single pick surfaces ONE citation even after skimming several pages. Falls back to the PagesRead footprint on a refusal / hop-capped run.

5. Prompt. A CITATION DISCIPLINE block: cite the fewest ranges (ideally one), framing a spray of uncertain ranges as worse than committing, and keeping the single best pick even at low confidence (annotate, don't suppress). Prompt wording is high-leverage — please review it.

Config: RetrievalConfig.PageIndex.MaxCitations (default 3). Env VLE_/VLS_RETRIEVAL_PAGEINDEX_MAX_CITATIONS, forwarded in internal/config.

Test plan

Dedup: 5 cites incl. duplicates -> <=3 distinct, no repeated section id (strategy + API layer).
Confident single citation preserved; confidence surfaces on Result + response.
Confidence-aware cap keeps the highest-confidence ranges.
Low-confidence answer still commits to its single best pick (over-suppression guard).
Out-of-range confidence clamps to [0,1]; refusal carries no confidence map.
Configurable cap (MaxCitations=1 -> exactly one).
Parser: top-level confidence (number/string) + rich cited_pages object form.
Config: default 3, env override, VLS_/VLE_ forwarding, negative rejected by Validate.
go build ./..., go vet ./..., full go test ./... green.

Do not merge yet.

FinanceBench (n=8) showed the page-based strategy's loss is commitment, not retrieval: a confident single citation scored f1=1.0/hit=1.0 (id_00807, id_00941), but when unsure the agent sprayed ~5 page ranges and missed all of them — once returning the same section id five times (id_00499: sec_363... x5), deflating precision. This makes the final citation set tight: - Dedup: every terminal cited-range set now flows through one selectCitedRanges chokepoint that clamps + dedups by range, so a range cited five times collapses to one and a section id never repeats in SelectedIDs. - Confidence gate: the done action takes an optional confidence (0-1), parsed from a top-level field or a rich cited_pages object form ({"pages":[5,7],"confidence":0.9}). Surfaced on Result.Confidence, Result.Confidences, the done event, and the API response. - Cap: when done cites more than retrieval.pageindex.max_citations (default 3) distinct ranges, keep the top-N by confidence (ties -> emission order). Mechanical backstop even when the prompt doesn't fully tame the spray. max_hops is unchanged — this bounds the FINAL citation set, not navigation. - Citations are now built from the final cited ranges (deduped+capped) rather than every page read, so a confident single pick surfaces ONE citation even when several pages were skimmed; falls back to the PagesRead footprint on a refusal / hop-capped run. - System prompt gains a CITATION DISCIPLINE block: cite the fewest ranges (ideally one), framing a spray of uncertain ranges as worse than committing, and keeping the single best pick even at low confidence (annotate, don't suppress). Config: RetrievalConfig.PageIndex.MaxCitations (default 3); env VLE_/VLS_RETRIEVAL_PAGEINDEX_MAX_CITATIONS, forwarded in internal/config. Tests: dedup (5 cites incl dupes -> <=3 distinct, no repeated id), confident-single preserved, confidence-aware cap, low-confidence still commits one, clamp, configurable cap, refusal carries no confidence, parser confidence + rich cited_pages, plus API-layer dedup/cap and single-citation proofs. Existing tests stay green.

sourcery-ai

Sorry @hallelx2, you have reached your weekly rate limit of 500000 diff characters.

Please try again later or upgrade to continue using Sourcery

coderabbitai · 2026-05-29T16:58:57Z

Warning

Review limit reached

@hallelx2, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 47 minutes and 4 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ace488af-0596-43d1-9540-5bea326fd177

📥 Commits

Reviewing files that changed from the base of the PR and between cbd46f5 and c76a386.

📒 Files selected for processing (14)

cmd/engine/main.go
cmd/server/main.go
config.example.yaml
internal/api/pageindex.go
internal/api/pageindex_test.go
internal/config/config.go
internal/config/config_pageindex_test.go
internal/handler/answer_pageindex.go
openapi.yaml
pkg/config/config.go
pkg/config/config_test.go
pkg/retrieval/pageindex_strategy.go
pkg/retrieval/pageindex_strategy_test.go
pkg/retrieval/strategy.go

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/pageindex-confidence-gate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings May 29, 2026 16:58

sourcery-ai Bot reviewed May 29, 2026

View reviewed changes

Copilot started reviewing on behalf of hallelx2 May 29, 2026 16:58 View session

Copilot AI reviewed May 29, 2026

View reviewed changes

Merge branch 'main' into feat/pageindex-confidence-gate

c76a386

hallelx2 merged commit 1cbc94d into main May 29, 2026
3 of 8 checks passed

hallelx2 deleted the feat/pageindex-confidence-gate branch May 29, 2026 17:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retrieval): confidence-gate + dedup page-based citations#32

feat(retrieval): confidence-gate + dedup page-based citations#32
hallelx2 merged 2 commits into
mainfrom
feat/pageindex-confidence-gate

hallelx2 commented May 29, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Review limit reached

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hallelx2 commented May 29, 2026

The signal (FinanceBench, n=8)

What changed

Test plan

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented May 29, 2026 •

edited

Loading