fix(context): correct 3 false-negative bugs in context-quality analyzer (10→8)#38
Conversation
…y analyzer The analyzer flagged repos as boilerplate even when their docs were rich, because section detection had three defects: 1. exact-key alias lookup — '## Commands By Mode' never matched the 'commands' alias; only headings exactly equal to an alias counted. 2. empty-parent headings — '## Quick Start' followed by '### Installation' had an empty direct body, so its run instructions (one level down) were invisible. 3. over-strict nontrivial threshold (>=4 words / >=24 chars) rejected valid terse sections — including the managed-context block's own generated '- Primary stack: Python' (3 words / 23 chars). Fix: parse ordered heading blocks with levels, roll descendant subsections up to their parent, prefix-anchor alias matching, strip badges/links before the content check, and relax the threshold to >=2 words / >=12 chars. Prefix-anchored matching (not substring) and skipping the H1 title avoid the inverse hazard surfaced in review: '## Memory Usage Statistics' must not match 'usage', and an alias word in the document title must not roll the whole file up as that section. Both pinned with regression tests. Live impact: 2 managed-block repos (cost-tracker, portfolio-health) clear, and run-detection false-negatives on 3 more are corrected. context-flag 10 -> 8.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4035908de9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if level < 2: | ||
| continue |
There was a problem hiding this comment.
When a README or context file uses H1-style section headings after the document title, such as # Project followed by # Usage or # Stack, this guard now ignores those sections entirely. The previous parser matched aliases at any heading depth, so those files lose run/stack/etc. detection and can be downgraded to boilerplate; only the title H1 needs to be excluded, not all level-1 blocks.
Useful? React with 👍 / 👎.
Phase 1 of Pile 2 (Arc A context-quality). 2a investigation found the analyzer flags repos as
boilerplateeven when docs are rich, due to three detection bugs — including the analyzer rejecting its own managed-block output.Bugs fixed
## Commands By Modenever matched thecommandsalias; only headings exactly equal to an alias counted.## Quick Start→### Installationleft the parent with an empty body, so run instructions one level down were invisible.- Primary stack: Python(3 words / 23 chars).Fix
Parse ordered heading blocks with levels, roll descendant subsections up to their parent, prefix-anchor alias matching, strip badges/links before the content check, relax the threshold to ≥2 words / ≥12 chars.
Prefix-anchoring (not substring) + skipping the H1 title close the inverse hazard a 3-angle
/code-reviewsurfaced:## Memory Usage Statisticsmust not matchusage, and an alias in the doc title must not roll the whole file up as that section. Both pinned with regression tests.Result
Gate
test_portfolio_context_contract.pyruff check .clean; full suite 2207 passed, 2 skipped