Spec audit: turn specification.website findings into suggested tasks#767
Draft
ilicfilip wants to merge 12 commits into
Draft
Spec audit: turn specification.website findings into suggested tasks#767ilicfilip wants to merge 12 commits into
ilicfilip wants to merge 12 commits into
Conversation
Introduce the audit layer that checks a site against specification.website. Defines the swappable Audit_Source contract + normalized finding schema (Audit_Runner), five deterministic PHP checks (doctype, lang, charset, robots.txt, sitemap) behind a filterable registry, a Local source that merges PHP checks with an optional AI pass (PHP wins on overlap), a Remote SaaS source stub for the future server-side engine, and Spec_Mcp_Client which drives WP 7.0's core AI client. Note: WP 7.0's AI client cannot act as an MCP client, so Spec_Mcp_Client feeds the spec checklist + HTML to wp_ai_client_prompt() instead. The whole AI path is guarded by is_available() and degrades to PHP-only checks; WP7-specific calls are marked TODO(wp7-verify) as they can't be exercised without a live WP 7.0 connector. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add the Spec_Audit data collector (caches the audit; runs the expensive checks/LLM only on cache refresh, never on admin_init) and the Spec_Audit task provider that turns failing findings into recommendations. The provider releases at most one task per window (default daily), overridable via progress_planner_spec_audit_max_tasks_per_window and _window filters; each failing rule maps to one durable task and is auto-completed when a re-audit shows it passing. Register both in their managers, and add a `wp prpl audit run` CLI command plus a progress_planner_run_spec_audit AJAX trigger for on-demand runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cover the deterministic PHP checks, Audit_Runner schema normalization/dedup, graceful degradation to PHP-only findings when the AI layer is unavailable, the per-window injection throttle and per-rule completion, and a shape-equality test asserting the local and remote sources produce identical finding shapes (the guard that keeps the phase-C and phase-B engines interchangeable). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Verified on a live WP 7.0 install: the AI builder (WP_AI_Client_Prompt_Builder) delegates SDK methods through __call, so the earlier method_exists() guards on using_max_tokens/as_json_response/is_supported_for_text_generation silently skipped those calls. Call them directly and gate availability on wp_supports_ai() plus is_supported_for_text_generation(). Confirmed end-to-end via `wp prpl audit run`: detects failing rules, injects one task, and the throttle blocks the second run; the AI path correctly reports unavailable when no provider is configured and degrades to PHP-only checks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously the inject-time recheck I added in ba74aa5 called wp_remote_get from get_tasks_to_inject(), which runs on every admin_init. Each admin pageview then triggered 1+ loopback HTTP requests back into the same PHP-FPM pool serving the user — pinning workers and starving the whole pool until nginx returned 502s for unrelated Valet sites. Three structural changes prevent that class of bug entirely: 1. The Spec_Audit data collector's update_cache() is a no-op unless an explicit caller has opted in via Spec_Audit_Data_Collector::with_explicit_refresh(). The Data_Collector_Manager's admin_init sweep therefore cannot trigger the audit. Sanctioned callers: CLI command, cron hook, AJAX shutdown. 2. collect() never falls back to calculate_data() on cache miss — a missing cache returns []. is_specific_task_completed() now distinguishes "no cache" from "rule passed", so an object-cache flush can't mass-complete every audit task. 3. The AJAX "run now" handler defers the audit to shutdown and calls fastcgi_finish_request() so the user's worker is released to the pool before the outbound HTTP starts. A daily wp-cron hook also drives refreshes from a non-web context. Reverted is_still_failing()'s live recheck — it's the wrong place for HTTP. Kept the deferred throttle counter (pure-PHP, no HTTP). Updated tests: dropped the live-recheck test; added tests for the cache-empty completion guard and the no-explicit-refresh no-op. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The robots.txt check turned out to be the wrong rule for a suggested task: WordPress generates a virtual robots.txt automatically, and when it fails (e.g. status 404 with a real body, as seen on a Yoast + Woo install), the fix is at the nginx/Valet routing level — not something a WordPress user can address from wp-admin. Suggested tasks should be actionable inside WordPress. Replace with a meta-description check that operates on the homepage HTML we already fetch (zero extra outbound HTTP). The user-facing fix is real and in-scope: install/configure an SEO plugin or set a description in the theme. Yoast/RankMath both supply it out of the box, so the recommendation naturally guides users to a known good path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tasks are keyed by rule_id. When we rename, retire, or filter out a check (as just happened with robots-txt), the old task lives on in users' DBs forever — its completion logic only knew how to react to the rule still being audited. Two changes: 1. On injection, persist the finding's source as prpl_source meta so the provider can later tell deterministic (php-check) tasks from probabilistic (mcp-llm / saas) ones. Legacy tasks without the meta are backfilled to 'php-check' (the original starter set was all PHP checks), so existing installs self-heal on upgrade too. 2. In is_specific_task_completed(), short-circuit to "complete" for any php-check task whose rule_id is no longer present in the live Checks_Registry. Does NOT apply to LLM/SaaS tasks — their rule space is open-ended and a rule missing from one audit just means the model didn't mention it this run, not that it was retired. Pure in-memory work; no outbound HTTP. The existing cache-empty guard in is_specific_task_completed() still prevents mass-completion from an object-cache flush. Tests use synthetic rule IDs (rule-a etc.) that the registry never knew about, so the test setUp now registers no-op stub checks for them; the new retired-rule tests remove all audit_checks filters to simulate removal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three bugs blocked the AI path on a live WP 7.0 + Anthropic connector, found via foreground reflection probing rather than running the audit live: 1. JSON schema rejected. Anthropic's structured-output API requires `additionalProperties:false` on every `type:object`. Without it the call 400s and the WP_Error is swallowed, leaving no diagnostic. 2. Checklist URL was wrong. /mcp/ returns the HTML page describing the MCP server, not spec content — the model was fed 22KB of irrelevant HTML. Switched to /llms.txt, the canonical LLM-oriented Markdown index (~37KB) the spec publishes for exactly this purpose. 3. Errors were silent. run_prompt() returned null on any failure, and audit_url() caught Throwables and json_decode failures the same way. Added log_error() that writes to error_log under WP_DEBUG so future failures show up in debug.log without changing the public contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the architecture, decisions, what's verified vs not, open questions, and how to test. The top priority follow-up is using spec.website's canonical slugs as rule_ids so PHP and LLM findings dedupe naturally and doc_urls point at real spec pages — written up in detail at the bottom. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PHP checks were inventing slugs (html-doctype, html-lang-attribute,
charset-meta, xml-sitemap) and the LLM was inventing its own (doctype,
html-lang, meta-charset, etc.). They covered the same rules with
different identifiers, so the "PHP wins on overlap" dedupe never fired
and doc_urls were generic /specification.website/ pointers.
Adopt the spec's own URL slugs as canonical rule_ids:
html-doctype → doctype (/spec/foundations/doctype/)
html-lang-attribute → html-lang (/spec/foundations/html-lang/)
charset-meta → meta-charset (/spec/foundations/meta-charset/)
meta-description → meta-description (/spec/foundations/meta-description/)
xml-sitemap → xml-sitemaps (/spec/seo/xml-sitemaps/)
Also align categories to the spec ('foundations' for the HTML-baseline
checks, 'seo' for sitemaps). Each finding's doc_url now points at the
actual spec page so the "Why is this important?" link is genuinely
useful.
Update the AI prompt to instruct the model to use canonical slugs derived
from the spec URL pattern, with concrete examples. PHP and LLM findings
will now dedupe naturally where they cover the same rule.
Existing tasks with the old rule_ids will be auto-completed by the
self-heal logic (commit bbfaf44) on the next admin pageload — no manual
migration needed. Drop the canonical-slug TODO from the handoff doc and
renumber remaining open questions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
Test on Playground |
Contributor
✅ Code Coverage Report
🎉 Great job maintaining/improving code coverage! 📊 File-level Coverage Changes (18 files)🆕 New Files
📈 Coverage Improved
📉 Coverage Decreased
ℹ️ About this report
|
is_specific_task_completed() treated "rule no longer reported in a populated audit" as completion for any source. Because the mcp-llm engine is non-deterministic, a rule simply being absent from a later audit run made its task self-complete even though the user fixed nothing — contradicting the documented design (only php-check tasks should complete on rule-absence; LLM/SaaS tasks complete only on an explicit pass). Guard the rule-absence branch on the php-check source. Add a regression test asserting an mcp-llm task is not completed on omission but is completed on an explicit pass, and clarify the existing php-check test. Update the handoff doc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
run_audit_now() returned only the tasks its own get_tasks_to_inject() call created. When the bootstrap inject_tasks() sweep already consumed the daily throttle slot earlier in the same request, the CLI command reported "0 task(s) injected" even though a task had been injected. Return pending_release_ids (all tasks injected this request) so the count reflects reality. Also fix the handoff's clean-state snippet: it iterated the cached get_tasks_by() result while delete_recommendation() flushed that cache group mid-loop, skipping tasks and leaving survivors. Snapshot the IDs with a raw get_posts() + provider tax_query instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a website-spec audit that runs against the site's public URL and turns each failing rule into a Progress Planner suggested task, throttled to 1/day. Two engines share all plugin-side task-mapping code:
https://specification.website/llms.txt. PHP wins on overlapping rule_ids.Designed so the audit engine can later move from the plugin to the progressplanner.com SaaS without touching task-creation code (phase B —
Remote_Audit_Sourceis a stub today, a regression test guards the contract).Key design points
fastcgi_finish_request()), or a dedicated cron hook. The data collector'supdate_cache()is a no-op unless an explicit caller has opted in. This caused FPM pool starvation during development and is now structurally prevented.prpl_source. If a PHP-check task'srule_idis no longer in the live registry, it auto-completes — so a future rule rename/retire doesn't strand orphan tasks. Legacy tasks without source meta backfill tophp-check.shutdownfor tasks that actually survived the request (so a same-request auto-completion doesn't burn the slot).doctype,html-lang,meta-charset,meta-description,xml-sitemaps). Each finding'sdoc_urlpoints at the real spec page.Full architecture, decisions, and open questions are in
HANDOFF-spec-audit.mdon this branch.Verified live
On a local WP 7.0 + Yoast + Woo + Anthropic Connector site (
planner.test):wp prpl audit runproduces ~5 PHP findings + ~10 LLM findings in ~18s.Not yet verified
[]untilprogressplanner.com/wp-json/progress-planner-saas/v1/auditexists).Test plan
HANDOFF-spec-audit.mdfor the why-behind-each-decision context.composer test(425/425 should pass — includes 27 spec-audit tests).composer phpstan(clean).wp prpl audit runon a fresh site — expect 1 task injected (the highest-severity failing rule).pass, next admin pageload auto-completes the task.mcp-llmsource findings appear inwp evaldump ofspec_audit_findings.Open follow-ups (handoff doc lists them in priority order)
https-tlsfor.test/.localURLs — small filter.Spec_Mcp_Client→Spec_Ai_Client(the "MCP" in the name was aspirational; core's AI client can't act as an MCP client).--dry-runflag on the CLI command.🤖 Generated with Claude Code