Skip to content

fix(context): correct 3 false-negative bugs in context-quality analyzer (10→8)#38

Merged
saagpatel merged 1 commit into
mainfrom
fix/context-analyzer-detection
Jun 3, 2026
Merged

fix(context): correct 3 false-negative bugs in context-quality analyzer (10→8)#38
saagpatel merged 1 commit into
mainfrom
fix/context-analyzer-detection

Conversation

@saagpatel
Copy link
Copy Markdown
Owner

Phase 1 of Pile 2 (Arc A context-quality). 2a investigation found the analyzer flags repos as boilerplate even when docs are rich, due to three detection bugs — including the analyzer rejecting its own managed-block output.

Bugs fixed

  1. Exact-key alias lookup## Commands By Mode never matched the commands alias; only headings exactly equal to an alias counted.
  2. Empty-parent headings## Quick Start### Installation left the parent with an empty body, so run instructions one level down were invisible.
  3. Over-strict threshold (≥4 words / ≥24 chars) rejected valid terse sections — including the managed-context block's own - Primary stack: Python (3 words / 23 chars).

Fix

Parse ordered heading blocks with levels, roll descendant subsections up to their parent, prefix-anchor alias matching, strip badges/links before the content check, relax the threshold to ≥2 words / ≥12 chars.

Prefix-anchoring (not substring) + skipping the H1 title close the inverse hazard a 3-angle /code-review surfaced: ## Memory Usage Statistics must not match usage, and an alias in the doc title must not roll the whole file up as that section. Both pinned with regression tests.

Result

  • context-flag 10 → 8. Cleared: cost-tracker, portfolio-health (managed blocks whose terse stack was wrongly rejected). Run-detection false-negatives corrected on GithubRepoAuditor, Grotto, rag-knowledge-base.
  • The remaining 8 genuinely lack handoff sections (risks/next) → Phase 2 (recovery-apply).

Gate

  • 7 new TDD/regression tests in test_portfolio_context_contract.py
  • ruff check . clean; full suite 2207 passed, 2 skipped
  • 3-angle code review (correctness / regression / edge-case) run; both confirmed findings fixed + live-probed

…y analyzer

The analyzer flagged repos as boilerplate even when their docs were rich,
because section detection had three defects:

1. exact-key alias lookup — '## Commands By Mode' never matched the 'commands'
   alias; only headings exactly equal to an alias counted.
2. empty-parent headings — '## Quick Start' followed by '### Installation' had
   an empty direct body, so its run instructions (one level down) were invisible.
3. over-strict nontrivial threshold (>=4 words / >=24 chars) rejected valid
   terse sections — including the managed-context block's own generated
   '- Primary stack: Python' (3 words / 23 chars).

Fix: parse ordered heading blocks with levels, roll descendant subsections up to
their parent, prefix-anchor alias matching, strip badges/links before the
content check, and relax the threshold to >=2 words / >=12 chars.

Prefix-anchored matching (not substring) and skipping the H1 title avoid the
inverse hazard surfaced in review: '## Memory Usage Statistics' must not match
'usage', and an alias word in the document title must not roll the whole file up
as that section. Both pinned with regression tests.

Live impact: 2 managed-block repos (cost-tracker, portfolio-health) clear, and
run-detection false-negatives on 3 more are corrected. context-flag 10 -> 8.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4035908de9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +328 to +329
if level < 2:
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Do not skip every H1 section

When a README or context file uses H1-style section headings after the document title, such as # Project followed by # Usage or # Stack, this guard now ignores those sections entirely. The previous parser matched aliases at any heading depth, so those files lose run/stack/etc. detection and can be downgraded to boilerplate; only the title H1 needs to be excluded, not all level-1 blocks.

Useful? React with 👍 / 👎.

@saagpatel saagpatel merged commit bba2e08 into main Jun 3, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant