Skip to content

feat(eval): add rerun-failed runner pooling#1609

Merged
christso merged 3 commits into
mainfrom
feat/av-kfik-10-runner
Jul 2, 2026
Merged

feat(eval): add rerun-failed runner pooling#1609
christso merged 3 commits into
mainfrom
feat/av-kfik-10-runner

Conversation

@christso

@christso christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Replacement for closed PR #1604. PR #1604 could not be reopened because its deleted stacked base branch feat/av-kfik-5-instance-expansion prevents reopening/retargeting.

Recovery details:

  • Rebases the runner work from original head 7242c84725f7f4d556f78ba29be2aa04e9a7e2e0 onto origin/main at 24c93648b4f2352ac5e5979b5dfec4d9d8cbb8c1.
  • Preserves the runner commits as rebased commits 5ae72715 and 6af012a5.
  • Local backup ref in the recovery worktree: recovery/av-lm1m-pre-rebase at the original head.

Validation after rebase:

  • bun install
  • bun run --cwd packages/core typecheck
  • bun run --cwd packages/core lint
  • bun test packages/core/test/evaluation/orchestrator.test.ts
  • bun run --cwd packages/core build
  • bun run --cwd apps/cli typecheck
  • bun run --cwd apps/cli lint
  • bun test apps/cli/test/eval.integration.test.ts
  • bun test apps/cli/test/commands/results/serve.test.ts
  • bun run --cwd apps/cli build
  • bun run --cwd apps/web build

Live dogfood was not rerun for this recovery because the rebase only reconciled identity plumbing with the already-merged prompt matrix work; prior live provider/grader evidence remains agentv-private:evidence/av-kfik10-runner commit 9a89276.

Tracker: Beads av-lm1m, parent av-kfik.10.

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Review finding on PR #1609 (compared against origin/main at 24c93648, head 6af012a5):

  • P2 packages/core/src/evaluation/orchestrator.ts:1237 and packages/core/src/evaluation/orchestrator.ts:1365: pooled workspace reset failures can still leak into later cases. In the single-slot path, dispatch always falls back to poolSlot when no queued slot is available, but the finally block only suppresses re-adding a failed slot to availablePoolSlots. For --workers 1, shouldReturnPoolSlot is always false, so a reset failure logs that the slot is left out of reuse but the next case still receives the same dirty poolSlot. In the multi-slot path, once failed slots are drained, later shared-workspace cases can fall through to undefined workspace paths. Since the feature contract is that pooled slots reset between cases, reset failure should fail/skip the affected remaining cases or otherwise mark the slot unusable without silently running dirty/no-workspace cases.

CI is green for PR #1609 (Build, Typecheck, Lint, Test, Check Links, Validate Marketplace, Validate Evals, Cloudflare Pages).

@christso christso force-pushed the feat/av-kfik-10-runner branch from 6af012a to 17cf44c Compare July 2, 2026 16:23
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jul 2, 2026

Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: a25016e
Status: ✅  Deploy successful!
Preview URL: https://08e152c7.agentv.pages.dev
Branch Preview URL: https://feat-av-kfik-10-runner.agentv.pages.dev

View logs

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the pooled workspace reset blocker from #1609 (comment) in 17cf44c.\n\nFix: pooled slots whose post-case reset fails are now quarantined instead of reused. Shared pooled-workspace cases must acquire a clean slot; when all slots are unavailable, later cases return a setup-stage execution_error with failureReasonCode=workspace_pool_unavailable rather than silently running on a dirty slot or falling through to fresh/no workspace behavior.\n\nRegression coverage added in packages/core/test/evaluation/orchestrator.test.ts for:\n- single-slot reset failure: second case does not invoke the provider on the dirty slot\n- multi-slot exhaustion: later case fails with workspace_pool_unavailable after all pooled slot resets fail\n\nValidation after rebasing onto origin/main 74b961c:\n- bun test packages/core/test/evaluation/orchestrator.test.ts -t "pooled workspace"\n- bun run --cwd packages/core typecheck\n- bun run --cwd packages/core lint\n- bun test packages/core/test/evaluation/orchestrator.test.ts\n- bun run --cwd packages/core build\n- bun run --cwd apps/cli typecheck (rerun sequentially after an initial parallel tsup clean race)\n- bun run --cwd apps/cli lint\n- bun run --cwd apps/cli build\n- bun test apps/cli/test/eval.integration.test.ts (clean rerun passed after initial stale-build race)\n- bun test apps/cli/test/commands/results/serve.test.ts (clean rerun passed after initial pre-build race)\n\nDogfood rationale: prior live provider/grader dogfood on this PR still covers the normal runner path. This fix is a deterministic reset-failure safety path; the new tests force reset failure by corrupting the pooled checkout and assert no dirty/no-workspace execution occurs.

@christso christso force-pushed the feat/av-kfik-10-runner branch from 17cf44c to a25016e Compare July 2, 2026 16:36
@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Rebased feat/av-kfik-10-runner onto current origin/main after PR #1603 merged at b091935743e4494ad615b683e6ee3fbce3c2e248. The rebase applied cleanly and preserves the runner pooling/rerun-failed behavior on top of the merged grading artifact contract.\n\nNew head: a25016e466e668cb5ac886abab815d7216dc862a.\n\nLocal validation rerun:\n- bun test packages/core/test/evaluation/orchestrator.test.ts -t 'pooled workspace'\n- bun run --cwd packages/core typecheck\n- bun run --cwd packages/core lint\n- bun test packages/core/test/evaluation/orchestrator.test.ts\n- bun run --cwd packages/core build\n- bun run --cwd apps/cli typecheck\n- bun run --cwd apps/cli lint\n- bun run --cwd apps/cli build\n- bun test apps/cli/test/eval.integration.test.ts\n- bun test apps/cli/test/commands/results/serve.test.ts\n\nFresh PR CI is green: https://github.com/EntityProcess/agentv/actions/runs/28606135105. PR remains draft/open for independent re-review.

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Final review for head a25016e466e668cb5ac886abab815d7216dc862a after the post-#1603 rebase: clean.

Confirmed the prior pooled workspace reset blocker is resolved: reset failures now quarantine the slot, wake waiters only when no clean slots remain, and return a setup-stage execution_error with failureReasonCode=workspace_pool_unavailable instead of reusing a dirty slot or falling through to no/fresh workspace behavior. The regression coverage covers both single-slot reset failure and multi-slot pool exhaustion.

Checked the #1603 integration surface around projection-aware result identities / artifact replacement and did not find a new blocker. Fresh PR CI is green for this head: https://github.com/EntityProcess/agentv/actions/runs/28606135105.

Verdict: no blocking findings. Orchestrator may proceed to ready/merge sequencing.

@christso christso marked this pull request as ready for review July 2, 2026 16:44
@christso christso merged commit 9ce3447 into main Jul 2, 2026
8 checks passed
@christso christso deleted the feat/av-kfik-10-runner branch July 2, 2026 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant