fix: surface CI ingest failures in the PR comment#156
Draft
veksen wants to merge 2 commits into
Draft
Conversation
When POST /ci/runs failed (e.g. the Site API rejected a payload), the analyzer logged a warning, returned null, and carried on — producing a degraded PR comment with no dashboard link that looked like a normal empty result, plus a green check. The failure was effectively invisible. postToSiteApi now returns a discriminated outcome carrying the HTTP status and a capped response body on failure. On a failed ingest, main.ts records it on the report context and emits a GitHub Actions warning annotation, and the comment renders a prominent CAUTION banner (with a collapsible details block) so the run no longer silently vanishes. Not setFailed — a Site API hiccup shouldn't block merging; this is a "you should know" signal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Query Doctor Analysis
Caution
Query Doctor couldn't record this run. The API rejected the submission (HTTP 400), so it wasn't saved to the dashboard and the results below may be incomplete. This usually means the analyzer and Query Doctor are out of sync — re-running won't help; if it persists, it's a bug on our side.
Details
{"name":"ZodError","message":"[\n {\n \"code\": \"invalid_value\",\n \"values\": [\n \"check\",\n \"foreign_key\",\n \"primary_key\",\n \"unique\",\n \"exclusion\"\n ],\n \"path\": [\n \"schema\",\n \"constraints\",\n 0,\n \"constraintType\"\n ],\n \"message\": \"Invalid option: expected one of \\\"check\\\"|\\\"foreign_key\\\"|\\\"primary_key\\\"|\\\"unique\\\"|\\\"exclusion\\\"\"\n },\n {\n \"code\": \"invalid_value\",\n \"values\": [\n \"check\",\n \"foreign_key\",\n \"primary_key\",\n \"unique\",\n …
3 queries analyzed
2 pre-existing issues
SELECT "guests"."id", "guests"."session_id", "guests"."username", "guests"."avatar_path", "guests"."color", "guests"."side", "guests"."audio_recording_path", "guests"."audio_recording_public", "gue...
indexassets(event_id, inserted_at desc)
cost 31,003,449 → 1,498 (100% reduction)SELECT * FROM guest_ip_addresses WHERE ip_address = '127.0.0.1';
indexguest_ip_addresses(ip_address)
cost 154,402 → 8 (100% reduction)
Using assumed statistics (10000000 rows/table). For better results, sync production stats.
A rejected ingest isn't one thing — it's a computed run that couldn't be saved, and what to do about it depends on why. Replace the single core.warning with a classification: - transient (network/timeout/5xx): recoverable → core.warning, "re-run". - auth (401/403): the user's to fix and means CI isn't running → core.setFailed. - rejected (other 4xx, e.g. analyzer/API contract skew): our bug, not the user's → core.error (red, loud) but non-blocking by default, since failing their PR over our fault is wrong. Opt in to block via FAIL_ON_INGEST_ERROR. The PR-comment banner now tailors its copy per kind (retry / fix token / out-of-sync). It's an error worth telling the user about — but named, scoped, and routed to who can actually act. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
When
POST /ci/runsfails, the run silently vanishes.postToSiteApilogged a warning, returnednull, and carried on — so the PR comment rendered with no dashboard link and a degraded "empty" look, the Actions check stayed green, and nothing was saved. The only trace was a line buried in the job log.This actually happened: a Postgres-18 schema dump produced a
not_nullconstraint type the Site API's ingest schema didn't accept, so it returned400. 68 queries were analyzed, but the run showed up nowhere and the comment lost its link.What a rejected ingest is
The analyzer did its whole job — captured, analyzed, dumped schema — and only the final persist step was refused. That's data loss, not a soft caveat, so it's an error worth telling the user. But the right severity and recipient depend on why it was rejected:
transientcore.warning, don't blockauthcore.setFailed— CI isn't runningrejectedcore.error(red, loud), non-blocking by defaultBlocking a user's PR over our contract bug would be wrong, so
rejectedstays non-blocking unless a team opts in withFAIL_ON_INGEST_ERROR=true. Auth always fails (it's theirs to fix); transient never does.Changes
postToSiteApireturns a discriminatedPostRunOutcomecarrying HTTPstatus+ a length-capped body, instead of barenull.classifyIngestFailure(status)→transient | auth | rejected;main.tspicks the Actions severity from it and recordsingestError(withkind) on the report context.> [!CAUTION]banner with kind-specific copy (retry / fix token / out-of-sync) and a collapsible Details block (the API's error).FAIL_ON_INGEST_ERRORenv flag (default off) to escalaterejectedto a hard check failure. Flows throughaction.yaml's inherited env; can be promoted to a formal action input later.Tests
classifyIngestFailure: transient (null/5xx), auth (401/403), rejected (400/422).postToSiteApi:okon 2xx;failurewith status + body on non-2xx;status: nullwhen the request never completes.tsc --noEmit+ build green. (5 suites fail to load locally on a missingpg_restorebinary — pre-existing/environment, identical on a clean tree.)Companion
Server side, handled in the Site repo: report all exceptions (incl. handled 4xx) to Sentry so this class of failure pages the operator (Query-Doctor/Site#3426). The constraint-type enum that caused the original
400is a separate root-cause follow-up.🤖 Generated with Claude Code