fix(context): correct 3 false-negative bugs in context-quality analyzer (10→8) by saagpatel · Pull Request #38 · saagpatel/GithubRepoAuditor

saagpatel · 2026-06-03T08:27:51Z

Phase 1 of Pile 2 (Arc A context-quality). 2a investigation found the analyzer flags repos as boilerplate even when docs are rich, due to three detection bugs — including the analyzer rejecting its own managed-block output.

Bugs fixed

Exact-key alias lookup — ## Commands By Mode never matched the commands alias; only headings exactly equal to an alias counted.
Empty-parent headings — ## Quick Start → ### Installation left the parent with an empty body, so run instructions one level down were invisible.
Over-strict threshold (≥4 words / ≥24 chars) rejected valid terse sections — including the managed-context block's own - Primary stack: Python (3 words / 23 chars).

Fix

Parse ordered heading blocks with levels, roll descendant subsections up to their parent, prefix-anchor alias matching, strip badges/links before the content check, relax the threshold to ≥2 words / ≥12 chars.

Prefix-anchoring (not substring) + skipping the H1 title close the inverse hazard a 3-angle /code-review surfaced: ## Memory Usage Statistics must not match usage, and an alias in the doc title must not roll the whole file up as that section. Both pinned with regression tests.

Result

context-flag 10 → 8. Cleared: cost-tracker, portfolio-health (managed blocks whose terse stack was wrongly rejected). Run-detection false-negatives corrected on GithubRepoAuditor, Grotto, rag-knowledge-base.
The remaining 8 genuinely lack handoff sections (risks/next) → Phase 2 (recovery-apply).

Gate

7 new TDD/regression tests in test_portfolio_context_contract.py
ruff check . clean; full suite 2207 passed, 2 skipped
3-angle code review (correctness / regression / edge-case) run; both confirmed findings fixed + live-probed

…y analyzer The analyzer flagged repos as boilerplate even when their docs were rich, because section detection had three defects: 1. exact-key alias lookup — '## Commands By Mode' never matched the 'commands' alias; only headings exactly equal to an alias counted. 2. empty-parent headings — '## Quick Start' followed by '### Installation' had an empty direct body, so its run instructions (one level down) were invisible. 3. over-strict nontrivial threshold (>=4 words / >=24 chars) rejected valid terse sections — including the managed-context block's own generated '- Primary stack: Python' (3 words / 23 chars). Fix: parse ordered heading blocks with levels, roll descendant subsections up to their parent, prefix-anchor alias matching, strip badges/links before the content check, and relax the threshold to >=2 words / >=12 chars. Prefix-anchored matching (not substring) and skipping the H1 title avoid the inverse hazard surfaced in review: '## Memory Usage Statistics' must not match 'usage', and an alias word in the document title must not roll the whole file up as that section. Both pinned with regression tests. Live impact: 2 managed-block repos (cost-tracker, portfolio-health) clear, and run-detection false-negatives on 3 more are corrected. context-flag 10 -> 8.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4035908de9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-03T08:29:38Z

+        if level < 2:
+            continue


Do not skip every H1 section

When a README or context file uses H1-style section headings after the document title, such as # Project followed by # Usage or # Stack, this guard now ignores those sections entirely. The previous parser matched aliases at any heading depth, so those files lose run/stack/etc. detection and can be downgraded to boilerplate; only the title H1 needs to be excluded, not all level-1 blocks.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Jun 3, 2026

View reviewed changes

saagpatel merged commit bba2e08 into main Jun 3, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(context): correct 3 false-negative bugs in context-quality analyzer (10→8)#38

fix(context): correct 3 false-negative bugs in context-quality analyzer (10→8)#38
saagpatel merged 1 commit into
mainfrom
fix/context-analyzer-detection

saagpatel commented Jun 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

saagpatel commented Jun 3, 2026

Bugs fixed

Fix

Result

Gate

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant