fix(query+ingest): value-anchored NL→SQL grounding, hierarchy resolution & pipeline robustness by poche123 · Pull Request #2 · firefly-operationOS/flyquery

poche123 · 2026-06-02T08:09:32Z

Why

End-to-end testing against two real, messy Excel workbooks (a 25k-row financial MTP table and an 8-sheet sales-structure workbook) exposed that the NL→SQL pipeline guessed SQL literals (e.g. Year IN (2023) when values are FY23, Brand='DAPA' when DAPA lives in Family/TA, summing a scaled FY instead of FY (Real)) and that the control flow reported wrong-but-empty results as confident answers. This PR makes the agents anchor on real data and hardens the query + ingestion pipelines.

Result: a question set that previously answered ~0/9 now answers Section 1 (financial) 4/4 exact and Section 2 (org/sales) 4/5, with the remaining miss flagged rather than confidently wrong.

What changed

Grounding / value anchoring

profile: store the full distinct value set for low-cardinality columns (was top-5) so filter/CASE literals are copied verbatim.
retrieval: value_fingerprint() on hits; render a "Column value catalogue" into the grounding and generation prompts; instructions for tall/EAV layouts and scaled-duplicate measures.
Structurally detect self-referencing hierarchy columns (value containment, with a row-wise discriminator so a duplicate name column isn't mistaken for a manager column) → "a person's team/reports/structure" filters the hierarchy column.

Control flow / governance

0-row / degenerate aggregate → clarification + confidence cap (no more confident empties).
Auto-learn gated on row_count > 0.
Prefer a candidate that passes the firewall and returns non-empty rows over candidates[0].
Exclude CTE names from AST table_refs (the firewall was rejecting valid CTE SQL).
Snapshot-scope retrieval; honor + persist conversation snapshot pins.
Unify /stream, :explain, :validate on the rendered prompts + firewall/scope guards (they previously ran blind).
Thread a re-firewalled extra_filter through the semantic-metric bind().

Ingestion robustness / determinism

Header-quality scoring (skip numeric spacer rows) + stable leading-blank handling.
temperature=0 for naming/describe agents → re-ingest schema churn 302 → 17 (~94%).
Rewrite Parquet column names on the no-key fallback (fixes profile/sample Binder errors).
Index column values into embeddings + content_tsv; warn when the reranker degrades to a no-op.

Validation (vs ground truth computed directly from the Parquet)

	Result
DAPA Brazil revenue FY23/24/25	✅ 175.6 / 127.5 / 119.9M
Operating profit CAMCAR FY24	✅ $252,519,261
Field:HQ Mexico FY23/FY26	✅ 3.49 / 3.40
CAGR Imfinzi/Lynparza/Tagrisso	✅ 96.1 / 22.1 / 19.0%
días-en-campo, consecución, team dimensioning, structure movements	✅ 4/5 (e.g. ANA consecución 0.911 exact)

Known limitations / follow-ups

Two-hop questions ("the BUM of Calquence's direct reports") aren't resolved — needs nested-entity grounding (the generator stops at the first hop). Tracked as a follow-up.
Subtotal detection is implemented conservatively but not surfaced as an exclusion directive (name-based exclusion was net-harmful on additive Total_* buckets); a structural value-aggregate == sum-of-others confirmation would re-enable it.
New logic is not yet unit-tested; task lint not run in this PR.

Note

pyproject.toml is intentionally excluded (it carried a machine-local pyfly path tweak).

…ion, pipeline robustness Surface column values into the grounding/generation prompts and harden the query + ingestion pipelines so the NL→SQL agents stop guessing literals and the control flow stops reporting wrong-but-empty results as confident answers. Grounding / value anchoring: - profile: store the full distinct value set for low-cardinality columns (was top-5) - retrieval: value_fingerprint() on hits; render a "Column value catalogue" into the grounding and generation prompts; instructions to copy WHERE/CASE literals verbatim, handle tall/EAV layouts and scaled-duplicate measures - structurally detect self-referencing hierarchy columns (value containment, with a row-wise discriminator) and teach "a person's team/reports/structure" -> filter the hierarchy column Control flow / governance: - treat 0-row / degenerate aggregate results as suspicious -> clarification + confidence cap - gate auto-learning on row_count > 0 (stop poisoning the example store) - prefer a candidate that passes the firewall and returns non-empty rows over candidates[0] - exclude CTE names from AST table_refs (firewall no longer rejects valid CTE SQL) - snapshot-scope retrieval; honor and persist conversation snapshot pins - unify /stream, :explain, :validate on the rendered prompts + firewall/scope guards - thread a re-firewalled extra_filter through the semantic-metric bind() Ingestion robustness / determinism: - header-quality scoring (skip numeric spacer rows) + stable leading-blank handling - temperature=0 for the naming/describe agents (cuts re-ingest schema churn ~94%) - rewrite Parquet column names on the no-key fallback (fixes profile/sample Binder errors) - index column values into embeddings + content_tsv - warn when the cross-encoder reranker degrades to a no-op

- builder.py is a lock-step (canon-pinned) module; revert it and pass the determinism setting via the existing extra_settings={"temperature": 0.0} instead of a new parameter, so the lockstep gate stays green - move the table-kinds lookup out of the query controller into TableResolver (service layer) so the "no raw SQL in controllers" gate passes - ruff: inline a negated return (SIM103), drop a forward-ref quote (UP037), and apply ruff format - tests: extend the fake TableResolvers with pins / current_snapshots / table_kinds_by_name; seed current_snapshot_id in the retrieval integration fixture (retrieval is now scoped to current_snapshot_id, as publish sets it)

Financial exports (Orbis/BvD) place one date-header row above a stack of sub-sections. Section-splitting orphaned that header above each sub-section's title, dropping the year columns and making the detailed financial tables (Ratios de Rentabilidad, Memo lineas, etc.) unqueryable by period. A data-first section now inherits the nearest recent period header as its column row, encoded as a backward-compatible 3-index section path ([header:data_start:end]); contiguous sections keep the legacy 2-index form. Guards against misfires: - period values are full-match only (INV-2024-0007 / Form 2020 are not years) - only a single leftmost label column may precede the inherited years - a label-headed table closes the period band (no stale-header bleed) - phantom null columns are compacted; header-row index is bounds-checked Adds unit coverage for inheritance, band-closing/no-false-inherit, path round-trip and legacy-path parsing.

Jose Agustin Puente added 3 commits June 2, 2026 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(query+ingest): value-anchored NL→SQL grounding, hierarchy resolution & pipeline robustness#2

fix(query+ingest): value-anchored NL→SQL grounding, hierarchy resolution & pipeline robustness#2
poche123 wants to merge 3 commits into
mainfrom
fix/nl2sql-grounding-and-robustness

poche123 commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

poche123 commented Jun 2, 2026

Why

What changed

Validation (vs ground truth computed directly from the Parquet)

Known limitations / follow-ups

Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant