Spec audit: turn specification.website findings into suggested tasks by ilicfilip · Pull Request #767 · ProgressPlanner/progress-planner

ilicfilip · 2026-05-30T08:43:49Z

Summary

Adds a website-spec audit that runs against the site's public URL and turns each failing rule into a Progress Planner suggested task, throttled to 1/day. Two engines share all plugin-side task-mapping code:

Deterministic PHP checks (always on): 5 starter rules — doctype, html-lang, meta-charset, meta-description, xml-sitemaps. Operates on a single shared homepage fetch.
WP 7.0 AI client (optional, requires a configured connector): asks Claude/GPT/Gemini to evaluate the homepage against https://specification.website/llms.txt. PHP wins on overlapping rule_ids.

Designed so the audit engine can later move from the plugin to the progressplanner.com SaaS without touching task-creation code (phase B — Remote_Audit_Source is a stub today, a regression test guards the contract).

Key design points

Zero outbound HTTP from admin_init. The audit runs only from CLI, AJAX-shutdown (with fastcgi_finish_request()), or a dedicated cron hook. The data collector's update_cache() is a no-op unless an explicit caller has opted in. This caused FPM pool starvation during development and is now structurally prevented.
Self-healing for retired rules. Tasks store prpl_source. If a PHP-check task's rule_id is no longer in the live registry, it auto-completes — so a future rule rename/retire doesn't strand orphan tasks. Legacy tasks without source meta backfill to php-check.
Throttle is deferred + survival-counted. The per-window counter only increments at shutdown for tasks that actually survived the request (so a same-request auto-completion doesn't burn the slot).
Canonical slugs. PHP and LLM both use spec.website's own URL slugs (doctype, html-lang, meta-charset, meta-description, xml-sitemaps). Each finding's doc_url points at the real spec page.

Full architecture, decisions, and open questions are in HANDOFF-spec-audit.md on this branch.

Verified live

On a local WP 7.0 + Yoast + Woo + Anthropic Connector site (planner.test):

✅ wp prpl audit run produces ~5 PHP findings + ~10 LLM findings in ~18s.
✅ Severity-prioritized throttle picks the most important rule for the daily slot.
✅ Fix → re-audit → auto-complete loop works end-to-end.
✅ Zero admin-pageload HTTP after the cache exists.

Not yet verified

Production WP 7.0 stable (testing was on nightly).
Phase-B SaaS endpoint (the stub returns [] until progressplanner.com/wp-json/progress-planner-saas/v1/audit exists).
Multisite.

Test plan

Pull the branch on a WP 7.0 install.
Read HANDOFF-spec-audit.md for the why-behind-each-decision context.
Run composer test (425/425 should pass — includes 27 spec-audit tests).
Run composer phpstan (clean).
wp prpl audit run on a fresh site — expect 1 task injected (the highest-severity failing rule).
Open WP admin → Progress Planner — task should appear with the spec.website "Why is this important?" link.
Fix the flagged issue, run again — rule flips to pass, next admin pageload auto-completes the task.
If you have an AI Connector configured: confirm mcp-llm source findings appear in wp eval dump of spec_audit_findings.

Open follow-ups (handoff doc lists them in priority order)

(medium) Local-dev false positive on https-tls for .test/.local URLs — small filter.
(medium) Rename Spec_Mcp_Client → Spec_Ai_Client (the "MCP" in the name was aspirational; core's AI client can't act as an MCP client).
(low) --dry-run flag on the CLI command.
(low) UI button for "Run audit now" (AJAX endpoint already exists).
(low) Deactivation cleanup should unschedule the cron hook.

🤖 Generated with Claude Code

Introduce the audit layer that checks a site against specification.website. Defines the swappable Audit_Source contract + normalized finding schema (Audit_Runner), five deterministic PHP checks (doctype, lang, charset, robots.txt, sitemap) behind a filterable registry, a Local source that merges PHP checks with an optional AI pass (PHP wins on overlap), a Remote SaaS source stub for the future server-side engine, and Spec_Mcp_Client which drives WP 7.0's core AI client. Note: WP 7.0's AI client cannot act as an MCP client, so Spec_Mcp_Client feeds the spec checklist + HTML to wp_ai_client_prompt() instead. The whole AI path is guarded by is_available() and degrades to PHP-only checks; WP7-specific calls are marked TODO(wp7-verify) as they can't be exercised without a live WP 7.0 connector. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add the Spec_Audit data collector (caches the audit; runs the expensive checks/LLM only on cache refresh, never on admin_init) and the Spec_Audit task provider that turns failing findings into recommendations. The provider releases at most one task per window (default daily), overridable via progress_planner_spec_audit_max_tasks_per_window and _window filters; each failing rule maps to one durable task and is auto-completed when a re-audit shows it passing. Register both in their managers, and add a `wp prpl audit run` CLI command plus a progress_planner_run_spec_audit AJAX trigger for on-demand runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cover the deterministic PHP checks, Audit_Runner schema normalization/dedup, graceful degradation to PHP-only findings when the AI layer is unavailable, the per-window injection throttle and per-rule completion, and a shape-equality test asserting the local and remote sources produce identical finding shapes (the guard that keeps the phase-C and phase-B engines interchangeable). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Verified on a live WP 7.0 install: the AI builder (WP_AI_Client_Prompt_Builder) delegates SDK methods through __call, so the earlier method_exists() guards on using_max_tokens/as_json_response/is_supported_for_text_generation silently skipped those calls. Call them directly and gate availability on wp_supports_ai() plus is_supported_for_text_generation(). Confirmed end-to-end via `wp prpl audit run`: detects failing rules, injects one task, and the throttle blocks the second run; the AI path correctly reports unavailable when no provider is configured and degrades to PHP-only checks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previously the inject-time recheck I added in ba74aa5 called wp_remote_get from get_tasks_to_inject(), which runs on every admin_init. Each admin pageview then triggered 1+ loopback HTTP requests back into the same PHP-FPM pool serving the user — pinning workers and starving the whole pool until nginx returned 502s for unrelated Valet sites. Three structural changes prevent that class of bug entirely: 1. The Spec_Audit data collector's update_cache() is a no-op unless an explicit caller has opted in via Spec_Audit_Data_Collector::with_explicit_refresh(). The Data_Collector_Manager's admin_init sweep therefore cannot trigger the audit. Sanctioned callers: CLI command, cron hook, AJAX shutdown. 2. collect() never falls back to calculate_data() on cache miss — a missing cache returns []. is_specific_task_completed() now distinguishes "no cache" from "rule passed", so an object-cache flush can't mass-complete every audit task. 3. The AJAX "run now" handler defers the audit to shutdown and calls fastcgi_finish_request() so the user's worker is released to the pool before the outbound HTTP starts. A daily wp-cron hook also drives refreshes from a non-web context. Reverted is_still_failing()'s live recheck — it's the wrong place for HTTP. Kept the deferred throttle counter (pure-PHP, no HTTP). Updated tests: dropped the live-recheck test; added tests for the cache-empty completion guard and the no-explicit-refresh no-op. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The robots.txt check turned out to be the wrong rule for a suggested task: WordPress generates a virtual robots.txt automatically, and when it fails (e.g. status 404 with a real body, as seen on a Yoast + Woo install), the fix is at the nginx/Valet routing level — not something a WordPress user can address from wp-admin. Suggested tasks should be actionable inside WordPress. Replace with a meta-description check that operates on the homepage HTML we already fetch (zero extra outbound HTTP). The user-facing fix is real and in-scope: install/configure an SEO plugin or set a description in the theme. Yoast/RankMath both supply it out of the box, so the recommendation naturally guides users to a known good path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tasks are keyed by rule_id. When we rename, retire, or filter out a check (as just happened with robots-txt), the old task lives on in users' DBs forever — its completion logic only knew how to react to the rule still being audited. Two changes: 1. On injection, persist the finding's source as prpl_source meta so the provider can later tell deterministic (php-check) tasks from probabilistic (mcp-llm / saas) ones. Legacy tasks without the meta are backfilled to 'php-check' (the original starter set was all PHP checks), so existing installs self-heal on upgrade too. 2. In is_specific_task_completed(), short-circuit to "complete" for any php-check task whose rule_id is no longer present in the live Checks_Registry. Does NOT apply to LLM/SaaS tasks — their rule space is open-ended and a rule missing from one audit just means the model didn't mention it this run, not that it was retired. Pure in-memory work; no outbound HTTP. The existing cache-empty guard in is_specific_task_completed() still prevents mass-completion from an object-cache flush. Tests use synthetic rule IDs (rule-a etc.) that the registry never knew about, so the test setUp now registers no-op stub checks for them; the new retired-rule tests remove all audit_checks filters to simulate removal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three bugs blocked the AI path on a live WP 7.0 + Anthropic connector, found via foreground reflection probing rather than running the audit live: 1. JSON schema rejected. Anthropic's structured-output API requires `additionalProperties:false` on every `type:object`. Without it the call 400s and the WP_Error is swallowed, leaving no diagnostic. 2. Checklist URL was wrong. /mcp/ returns the HTML page describing the MCP server, not spec content — the model was fed 22KB of irrelevant HTML. Switched to /llms.txt, the canonical LLM-oriented Markdown index (~37KB) the spec publishes for exactly this purpose. 3. Errors were silent. run_prompt() returned null on any failure, and audit_url() caught Throwables and json_decode failures the same way. Added log_error() that writes to error_log under WP_DEBUG so future failures show up in debug.log without changing the public contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Captures the architecture, decisions, what's verified vs not, open questions, and how to test. The top priority follow-up is using spec.website's canonical slugs as rule_ids so PHP and LLM findings dedupe naturally and doc_urls point at real spec pages — written up in detail at the bottom. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PHP checks were inventing slugs (html-doctype, html-lang-attribute, charset-meta, xml-sitemap) and the LLM was inventing its own (doctype, html-lang, meta-charset, etc.). They covered the same rules with different identifiers, so the "PHP wins on overlap" dedupe never fired and doc_urls were generic /specification.website/ pointers. Adopt the spec's own URL slugs as canonical rule_ids: html-doctype → doctype (/spec/foundations/doctype/) html-lang-attribute → html-lang (/spec/foundations/html-lang/) charset-meta → meta-charset (/spec/foundations/meta-charset/) meta-description → meta-description (/spec/foundations/meta-description/) xml-sitemap → xml-sitemaps (/spec/seo/xml-sitemaps/) Also align categories to the spec ('foundations' for the HTML-baseline checks, 'seo' for sitemaps). Each finding's doc_url now points at the actual spec page so the "Why is this important?" link is genuinely useful. Update the AI prompt to instruct the model to use canonical slugs derived from the spec URL pattern, with concrete examples. PHP and LLM findings will now dedupe naturally where they cover the same rule. Existing tasks with the old rule_ids will be auto-completed by the self-heal logic (commit bbfaf44) on the next admin pageload — no manual migration needed. Drop the canonical-slug TODO from the handoff doc and renumber remaining open questions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-30T08:44:03Z

Test on Playground
Test this pull request on the Playground
or download the zip

github-actions · 2026-05-30T08:45:55Z

✅ Code Coverage Report

Metric	Value
Total Coverage	32.47% 📉
Base Coverage	31.60%
Difference	📈 0.87%

⚠️ Coverage below recommended 40% threshold

🎉 Great job maintaining/improving code coverage!

📊 File-level Coverage Changes (18 files)

🆕 New Files

Class	Coverage	Lines
🟡 `Progress_Planner\Suggested_Tasks\Audit\Audit_Runner`	78.57%	33/42
🟢 `Progress_Planner\Suggested_Tasks\Audit\Checks\Charset_Check`	93.75%	15/16
🔴 `Progress_Planner\Suggested_Tasks\Audit\Checks\Checks_Registry`	26.83%	11/41
🟢 `Progress_Planner\Suggested_Tasks\Audit\Checks\Doctype_Check`	100.00%	14/14
🟢 `Progress_Planner\Suggested_Tasks\Audit\Checks\Lang_Attribute_Check`	92.86%	13/14
🟢 `Progress_Planner\Suggested_Tasks\Audit\Checks\Meta_Description_Check`	100.00%	19/19
🔴 `Progress_Planner\Suggested_Tasks\Audit\Checks\Sitemap_Check`	3.70%	1/27
🟢 `Progress_Planner\Suggested_Tasks\Audit\Local_Audit_Source`	87.50%	14/16
🔴 `Progress_Planner\Suggested_Tasks\Audit\Remote_Audit_Source`	37.21%	16/43
🔴 `Progress_Planner\Suggested_Tasks\Audit\Spec_Mcp_Client`	0.00%	0/123
🔴 `Progress_Planner\Suggested_Tasks\Data_Collector\Spec_Audit`	55.56%	10/18
🟡 `Progress_Planner\Suggested_Tasks\Providers\Spec_Audit`	68.84%	95/138
🔴 `Progress_Planner\WP_CLI\Audit_Command`	0.00%	0/28

📈 Coverage Improved

Class	Before	After	Change
`Progress_Planner\Suggested_Tasks\Providers\Tasks`	36.59%	38.41%	+1.82%
`Progress_Planner\Suggested_Tasks\Data_Collector\Data_Collector_Manager`	64.29%	65.52%	+1.23%
`Progress_Planner\Suggested_Tasks_DB`	90.11%	90.66%	+0.55%
`Progress_Planner\Suggested_Tasks\Tasks_Manager`	62.83%	63.16%	+0.33%

📉 Coverage Decreased

Class	Before	After	Change
`Progress_Planner\Base`	45.40%	45.12%	-0.28%

ℹ️ About this report

All tests run in a single job with Xdebug coverage
Security tests excluded from coverage to prevent output issues
Coverage calculated from line coverage percentages

is_specific_task_completed() treated "rule no longer reported in a populated audit" as completion for any source. Because the mcp-llm engine is non-deterministic, a rule simply being absent from a later audit run made its task self-complete even though the user fixed nothing — contradicting the documented design (only php-check tasks should complete on rule-absence; LLM/SaaS tasks complete only on an explicit pass). Guard the rule-absence branch on the php-check source. Add a regression test asserting an mcp-llm task is not completed on omission but is completed on an explicit pass, and clarify the existing php-check test. Update the handoff doc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

run_audit_now() returned only the tasks its own get_tasks_to_inject() call created. When the bootstrap inject_tasks() sweep already consumed the daily throttle slot earlier in the same request, the CLI command reported "0 task(s) injected" even though a task had been injected. Return pending_release_ids (all tasks injected this request) so the count reflects reality. Also fix the handoff's clean-state snippet: it iterated the cached get_tasks_by() result while delete_recommendation() flushed that cache group mid-loop, skipping tasks and leaving survivors. Snapshot the IDs with a raw get_posts() + provider tax_query instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ilicfilip and others added 10 commits May 29, 2026 17:47

ilicfilip and others added 2 commits June 1, 2026 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec audit: turn specification.website findings into suggested tasks#767

Spec audit: turn specification.website findings into suggested tasks#767
ilicfilip wants to merge 12 commits into
developfrom
filip/spec-audit

ilicfilip commented May 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 30, 2026 •

edited

Loading

🆕 New Files

📈 Coverage Improved

📉 Coverage Decreased

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ilicfilip commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key design points

Verified live

Not yet verified

Test plan

Open follow-ups (handoff doc lists them in priority order)

Uh oh!

github-actions Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Code Coverage Report

🆕 New Files

📈 Coverage Improved

📉 Coverage Decreased

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ilicfilip commented May 30, 2026 •

edited

Loading

github-actions Bot commented May 30, 2026 •

edited

Loading

github-actions Bot commented May 30, 2026 •

edited

Loading