Skip to content

Spec audit: turn specification.website findings into suggested tasks#767

Draft
ilicfilip wants to merge 12 commits into
developfrom
filip/spec-audit
Draft

Spec audit: turn specification.website findings into suggested tasks#767
ilicfilip wants to merge 12 commits into
developfrom
filip/spec-audit

Conversation

@ilicfilip
Copy link
Copy Markdown
Collaborator

@ilicfilip ilicfilip commented May 30, 2026

Summary

Adds a website-spec audit that runs against the site's public URL and turns each failing rule into a Progress Planner suggested task, throttled to 1/day. Two engines share all plugin-side task-mapping code:

  • Deterministic PHP checks (always on): 5 starter rules — doctype, html-lang, meta-charset, meta-description, xml-sitemaps. Operates on a single shared homepage fetch.
  • WP 7.0 AI client (optional, requires a configured connector): asks Claude/GPT/Gemini to evaluate the homepage against https://specification.website/llms.txt. PHP wins on overlapping rule_ids.

Designed so the audit engine can later move from the plugin to the progressplanner.com SaaS without touching task-creation code (phase B — Remote_Audit_Source is a stub today, a regression test guards the contract).

Key design points

  • Zero outbound HTTP from admin_init. The audit runs only from CLI, AJAX-shutdown (with fastcgi_finish_request()), or a dedicated cron hook. The data collector's update_cache() is a no-op unless an explicit caller has opted in. This caused FPM pool starvation during development and is now structurally prevented.
  • Self-healing for retired rules. Tasks store prpl_source. If a PHP-check task's rule_id is no longer in the live registry, it auto-completes — so a future rule rename/retire doesn't strand orphan tasks. Legacy tasks without source meta backfill to php-check.
  • Throttle is deferred + survival-counted. The per-window counter only increments at shutdown for tasks that actually survived the request (so a same-request auto-completion doesn't burn the slot).
  • Canonical slugs. PHP and LLM both use spec.website's own URL slugs (doctype, html-lang, meta-charset, meta-description, xml-sitemaps). Each finding's doc_url points at the real spec page.

Full architecture, decisions, and open questions are in HANDOFF-spec-audit.md on this branch.

Verified live

On a local WP 7.0 + Yoast + Woo + Anthropic Connector site (planner.test):

  • wp prpl audit run produces ~5 PHP findings + ~10 LLM findings in ~18s.
  • ✅ Severity-prioritized throttle picks the most important rule for the daily slot.
  • ✅ Fix → re-audit → auto-complete loop works end-to-end.
  • ✅ Zero admin-pageload HTTP after the cache exists.

Not yet verified

  • Production WP 7.0 stable (testing was on nightly).
  • Phase-B SaaS endpoint (the stub returns [] until progressplanner.com/wp-json/progress-planner-saas/v1/audit exists).
  • Multisite.

Test plan

  • Pull the branch on a WP 7.0 install.
  • Read HANDOFF-spec-audit.md for the why-behind-each-decision context.
  • Run composer test (425/425 should pass — includes 27 spec-audit tests).
  • Run composer phpstan (clean).
  • wp prpl audit run on a fresh site — expect 1 task injected (the highest-severity failing rule).
  • Open WP admin → Progress Planner — task should appear with the spec.website "Why is this important?" link.
  • Fix the flagged issue, run again — rule flips to pass, next admin pageload auto-completes the task.
  • If you have an AI Connector configured: confirm mcp-llm source findings appear in wp eval dump of spec_audit_findings.

Open follow-ups (handoff doc lists them in priority order)

  1. (medium) Local-dev false positive on https-tls for .test/.local URLs — small filter.
  2. (medium) Rename Spec_Mcp_ClientSpec_Ai_Client (the "MCP" in the name was aspirational; core's AI client can't act as an MCP client).
  3. (low) --dry-run flag on the CLI command.
  4. (low) UI button for "Run audit now" (AJAX endpoint already exists).
  5. (low) Deactivation cleanup should unschedule the cron hook.

🤖 Generated with Claude Code

ilicfilip and others added 10 commits May 29, 2026 17:47
Introduce the audit layer that checks a site against specification.website.
Defines the swappable Audit_Source contract + normalized finding schema
(Audit_Runner), five deterministic PHP checks (doctype, lang, charset,
robots.txt, sitemap) behind a filterable registry, a Local source that merges
PHP checks with an optional AI pass (PHP wins on overlap), a Remote SaaS source
stub for the future server-side engine, and Spec_Mcp_Client which drives WP 7.0's
core AI client.

Note: WP 7.0's AI client cannot act as an MCP client, so Spec_Mcp_Client feeds
the spec checklist + HTML to wp_ai_client_prompt() instead. The whole AI path is
guarded by is_available() and degrades to PHP-only checks; WP7-specific calls are
marked TODO(wp7-verify) as they can't be exercised without a live WP 7.0 connector.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the Spec_Audit data collector (caches the audit; runs the expensive
checks/LLM only on cache refresh, never on admin_init) and the Spec_Audit task
provider that turns failing findings into recommendations. The provider releases
at most one task per window (default daily), overridable via
progress_planner_spec_audit_max_tasks_per_window and _window filters; each
failing rule maps to one durable task and is auto-completed when a re-audit shows
it passing. Register both in their managers, and add a `wp prpl audit run` CLI
command plus a progress_planner_run_spec_audit AJAX trigger for on-demand runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover the deterministic PHP checks, Audit_Runner schema normalization/dedup,
graceful degradation to PHP-only findings when the AI layer is unavailable, the
per-window injection throttle and per-rule completion, and a shape-equality test
asserting the local and remote sources produce identical finding shapes (the
guard that keeps the phase-C and phase-B engines interchangeable).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verified on a live WP 7.0 install: the AI builder (WP_AI_Client_Prompt_Builder)
delegates SDK methods through __call, so the earlier method_exists() guards on
using_max_tokens/as_json_response/is_supported_for_text_generation silently
skipped those calls. Call them directly and gate availability on wp_supports_ai()
plus is_supported_for_text_generation(). Confirmed end-to-end via `wp prpl audit
run`: detects failing rules, injects one task, and the throttle blocks the second
run; the AI path correctly reports unavailable when no provider is configured and
degrades to PHP-only checks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the inject-time recheck I added in ba74aa5 called wp_remote_get
from get_tasks_to_inject(), which runs on every admin_init. Each admin
pageview then triggered 1+ loopback HTTP requests back into the same
PHP-FPM pool serving the user — pinning workers and starving the whole
pool until nginx returned 502s for unrelated Valet sites.

Three structural changes prevent that class of bug entirely:

1. The Spec_Audit data collector's update_cache() is a no-op unless an
   explicit caller has opted in via Spec_Audit_Data_Collector::with_explicit_refresh().
   The Data_Collector_Manager's admin_init sweep therefore cannot trigger
   the audit. Sanctioned callers: CLI command, cron hook, AJAX shutdown.
2. collect() never falls back to calculate_data() on cache miss — a missing
   cache returns []. is_specific_task_completed() now distinguishes "no
   cache" from "rule passed", so an object-cache flush can't mass-complete
   every audit task.
3. The AJAX "run now" handler defers the audit to shutdown and calls
   fastcgi_finish_request() so the user's worker is released to the pool
   before the outbound HTTP starts. A daily wp-cron hook also drives
   refreshes from a non-web context.

Reverted is_still_failing()'s live recheck — it's the wrong place for HTTP.
Kept the deferred throttle counter (pure-PHP, no HTTP). Updated tests:
dropped the live-recheck test; added tests for the cache-empty completion
guard and the no-explicit-refresh no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The robots.txt check turned out to be the wrong rule for a suggested task:
WordPress generates a virtual robots.txt automatically, and when it fails
(e.g. status 404 with a real body, as seen on a Yoast + Woo install), the
fix is at the nginx/Valet routing level — not something a WordPress user
can address from wp-admin. Suggested tasks should be actionable inside
WordPress.

Replace with a meta-description check that operates on the homepage HTML
we already fetch (zero extra outbound HTTP). The user-facing fix is real
and in-scope: install/configure an SEO plugin or set a description in the
theme. Yoast/RankMath both supply it out of the box, so the recommendation
naturally guides users to a known good path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tasks are keyed by rule_id. When we rename, retire, or filter out a check
(as just happened with robots-txt), the old task lives on in users' DBs
forever — its completion logic only knew how to react to the rule still
being audited.

Two changes:

1. On injection, persist the finding's source as prpl_source meta so the
   provider can later tell deterministic (php-check) tasks from
   probabilistic (mcp-llm / saas) ones. Legacy tasks without the meta are
   backfilled to 'php-check' (the original starter set was all PHP checks),
   so existing installs self-heal on upgrade too.

2. In is_specific_task_completed(), short-circuit to "complete" for any
   php-check task whose rule_id is no longer present in the live
   Checks_Registry. Does NOT apply to LLM/SaaS tasks — their rule space
   is open-ended and a rule missing from one audit just means the model
   didn't mention it this run, not that it was retired.

Pure in-memory work; no outbound HTTP. The existing cache-empty guard in
is_specific_task_completed() still prevents mass-completion from an
object-cache flush.

Tests use synthetic rule IDs (rule-a etc.) that the registry never knew
about, so the test setUp now registers no-op stub checks for them; the
new retired-rule tests remove all audit_checks filters to simulate
removal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three bugs blocked the AI path on a live WP 7.0 + Anthropic connector,
found via foreground reflection probing rather than running the audit live:

1. JSON schema rejected. Anthropic's structured-output API requires
   `additionalProperties:false` on every `type:object`. Without it the
   call 400s and the WP_Error is swallowed, leaving no diagnostic.

2. Checklist URL was wrong. /mcp/ returns the HTML page describing the
   MCP server, not spec content — the model was fed 22KB of irrelevant
   HTML. Switched to /llms.txt, the canonical LLM-oriented Markdown
   index (~37KB) the spec publishes for exactly this purpose.

3. Errors were silent. run_prompt() returned null on any failure, and
   audit_url() caught Throwables and json_decode failures the same way.
   Added log_error() that writes to error_log under WP_DEBUG so future
   failures show up in debug.log without changing the public contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the architecture, decisions, what's verified vs not, open questions,
and how to test. The top priority follow-up is using spec.website's canonical
slugs as rule_ids so PHP and LLM findings dedupe naturally and doc_urls point
at real spec pages — written up in detail at the bottom.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PHP checks were inventing slugs (html-doctype, html-lang-attribute,
charset-meta, xml-sitemap) and the LLM was inventing its own (doctype,
html-lang, meta-charset, etc.). They covered the same rules with
different identifiers, so the "PHP wins on overlap" dedupe never fired
and doc_urls were generic /specification.website/ pointers.

Adopt the spec's own URL slugs as canonical rule_ids:
  html-doctype          → doctype          (/spec/foundations/doctype/)
  html-lang-attribute   → html-lang        (/spec/foundations/html-lang/)
  charset-meta          → meta-charset     (/spec/foundations/meta-charset/)
  meta-description      → meta-description (/spec/foundations/meta-description/)
  xml-sitemap           → xml-sitemaps     (/spec/seo/xml-sitemaps/)

Also align categories to the spec ('foundations' for the HTML-baseline
checks, 'seo' for sitemaps). Each finding's doc_url now points at the
actual spec page so the "Why is this important?" link is genuinely
useful.

Update the AI prompt to instruct the model to use canonical slugs derived
from the spec URL pattern, with concrete examples. PHP and LLM findings
will now dedupe naturally where they cover the same rule.

Existing tasks with the old rule_ids will be auto-completed by the
self-heal logic (commit bbfaf44) on the next admin pageload — no manual
migration needed. Drop the canonical-slug TODO from the handoff doc and
renumber remaining open questions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 30, 2026

Test on Playground
Test this pull request on the Playground
or download the zip

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 30, 2026

✅ Code Coverage Report

Metric Value
Total Coverage 32.47% 📉
Base Coverage 31.60%
Difference 📈 0.87%

⚠️ Coverage below recommended 40% threshold

🎉 Great job maintaining/improving code coverage!

📊 File-level Coverage Changes (18 files)

🆕 New Files

Class Coverage Lines
🟡 Progress_Planner\Suggested_Tasks\Audit\Audit_Runner 78.57% 33/42
🟢 Progress_Planner\Suggested_Tasks\Audit\Checks\Charset_Check 93.75% 15/16
🔴 Progress_Planner\Suggested_Tasks\Audit\Checks\Checks_Registry 26.83% 11/41
🟢 Progress_Planner\Suggested_Tasks\Audit\Checks\Doctype_Check 100.00% 14/14
🟢 Progress_Planner\Suggested_Tasks\Audit\Checks\Lang_Attribute_Check 92.86% 13/14
🟢 Progress_Planner\Suggested_Tasks\Audit\Checks\Meta_Description_Check 100.00% 19/19
🔴 Progress_Planner\Suggested_Tasks\Audit\Checks\Sitemap_Check 3.70% 1/27
🟢 Progress_Planner\Suggested_Tasks\Audit\Local_Audit_Source 87.50% 14/16
🔴 Progress_Planner\Suggested_Tasks\Audit\Remote_Audit_Source 37.21% 16/43
🔴 Progress_Planner\Suggested_Tasks\Audit\Spec_Mcp_Client 0.00% 0/123
🔴 Progress_Planner\Suggested_Tasks\Data_Collector\Spec_Audit 55.56% 10/18
🟡 Progress_Planner\Suggested_Tasks\Providers\Spec_Audit 68.84% 95/138
🔴 Progress_Planner\WP_CLI\Audit_Command 0.00% 0/28

📈 Coverage Improved

Class Before After Change
Progress_Planner\Suggested_Tasks\Providers\Tasks 36.59% 38.41% +1.82%
Progress_Planner\Suggested_Tasks\Data_Collector\Data_Collector_Manager 64.29% 65.52% +1.23%
Progress_Planner\Suggested_Tasks_DB 90.11% 90.66% +0.55%
Progress_Planner\Suggested_Tasks\Tasks_Manager 62.83% 63.16% +0.33%

📉 Coverage Decreased

Class Before After Change
Progress_Planner\Base 45.40% 45.12% -0.28%
ℹ️ About this report
  • All tests run in a single job with Xdebug coverage
  • Security tests excluded from coverage to prevent output issues
  • Coverage calculated from line coverage percentages

ilicfilip and others added 2 commits June 1, 2026 14:40
is_specific_task_completed() treated "rule no longer reported in a
populated audit" as completion for any source. Because the mcp-llm
engine is non-deterministic, a rule simply being absent from a later
audit run made its task self-complete even though the user fixed
nothing — contradicting the documented design (only php-check tasks
should complete on rule-absence; LLM/SaaS tasks complete only on an
explicit pass).

Guard the rule-absence branch on the php-check source. Add a
regression test asserting an mcp-llm task is not completed on omission
but is completed on an explicit pass, and clarify the existing
php-check test. Update the handoff doc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
run_audit_now() returned only the tasks its own get_tasks_to_inject()
call created. When the bootstrap inject_tasks() sweep already consumed
the daily throttle slot earlier in the same request, the CLI command
reported "0 task(s) injected" even though a task had been injected.
Return pending_release_ids (all tasks injected this request) so the
count reflects reality.

Also fix the handoff's clean-state snippet: it iterated the cached
get_tasks_by() result while delete_recommendation() flushed that cache
group mid-loop, skipping tasks and leaving survivors. Snapshot the IDs
with a raw get_posts() + provider tax_query instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant