feat(eval): add rerun-failed runner pooling by christso · Pull Request #1609 · EntityProcess/agentv

christso · 2026-07-02T15:53:16Z

Replacement for closed PR #1604. PR #1604 could not be reopened because its deleted stacked base branch feat/av-kfik-5-instance-expansion prevents reopening/retargeting.

Recovery details:

Rebases the runner work from original head 7242c84725f7f4d556f78ba29be2aa04e9a7e2e0 onto origin/main at 24c93648b4f2352ac5e5979b5dfec4d9d8cbb8c1.
Preserves the runner commits as rebased commits 5ae72715 and 6af012a5.
Local backup ref in the recovery worktree: recovery/av-lm1m-pre-rebase at the original head.

Validation after rebase:

bun install
bun run --cwd packages/core typecheck
bun run --cwd packages/core lint
bun test packages/core/test/evaluation/orchestrator.test.ts
bun run --cwd packages/core build
bun run --cwd apps/cli typecheck
bun run --cwd apps/cli lint
bun test apps/cli/test/eval.integration.test.ts
bun test apps/cli/test/commands/results/serve.test.ts
bun run --cwd apps/cli build
bun run --cwd apps/web build

Live dogfood was not rerun for this recovery because the rebase only reconciled identity plumbing with the already-merged prompt matrix work; prior live provider/grader evidence remains agentv-private:evidence/av-kfik10-runner commit 9a89276.

Tracker: Beads av-lm1m, parent av-kfik.10.

christso · 2026-07-02T16:02:40Z

Review finding on PR #1609 (compared against origin/main at 24c93648, head 6af012a5):

P2 packages/core/src/evaluation/orchestrator.ts:1237 and packages/core/src/evaluation/orchestrator.ts:1365: pooled workspace reset failures can still leak into later cases. In the single-slot path, dispatch always falls back to poolSlot when no queued slot is available, but the finally block only suppresses re-adding a failed slot to availablePoolSlots. For --workers 1, shouldReturnPoolSlot is always false, so a reset failure logs that the slot is left out of reuse but the next case still receives the same dirty poolSlot. In the multi-slot path, once failed slots are drained, later shared-workspace cases can fall through to undefined workspace paths. Since the feature contract is that pooled slots reset between cases, reset failure should fail/skip the affected remaining cases or otherwise mark the slot unusable without silently running dirty/no-workspace cases.

CI is green for PR #1609 (Build, Typecheck, Lint, Test, Check Links, Validate Marketplace, Validate Evals, Cloudflare Pages).

cloudflare-workers-and-pages · 2026-07-02T16:23:45Z

Deploying agentv with Cloudflare Pages

Latest commit:	`a25016e`
Status:	✅ Deploy successful!
Preview URL:	https://08e152c7.agentv.pages.dev
Branch Preview URL:	https://feat-av-kfik-10-runner.agentv.pages.dev

View logs

christso · 2026-07-02T16:23:58Z

Addressed the pooled workspace reset blocker from #1609 (comment) in 17cf44c.\n\nFix: pooled slots whose post-case reset fails are now quarantined instead of reused. Shared pooled-workspace cases must acquire a clean slot; when all slots are unavailable, later cases return a setup-stage execution_error with failureReasonCode=workspace_pool_unavailable rather than silently running on a dirty slot or falling through to fresh/no workspace behavior.\n\nRegression coverage added in packages/core/test/evaluation/orchestrator.test.ts for:\n- single-slot reset failure: second case does not invoke the provider on the dirty slot\n- multi-slot exhaustion: later case fails with workspace_pool_unavailable after all pooled slot resets fail\n\nValidation after rebasing onto origin/main 74b961c:\n- bun test packages/core/test/evaluation/orchestrator.test.ts -t "pooled workspace"\n- bun run --cwd packages/core typecheck\n- bun run --cwd packages/core lint\n- bun test packages/core/test/evaluation/orchestrator.test.ts\n- bun run --cwd packages/core build\n- bun run --cwd apps/cli typecheck (rerun sequentially after an initial parallel tsup clean race)\n- bun run --cwd apps/cli lint\n- bun run --cwd apps/cli build\n- bun test apps/cli/test/eval.integration.test.ts (clean rerun passed after initial stale-build race)\n- bun test apps/cli/test/commands/results/serve.test.ts (clean rerun passed after initial pre-build race)\n\nDogfood rationale: prior live provider/grader dogfood on this PR still covers the normal runner path. This fix is a deterministic reset-failure safety path; the new tests force reset failure by corrupting the pooled checkout and assert no dirty/no-workspace execution occurs.

christso · 2026-07-02T16:39:18Z

Rebased feat/av-kfik-10-runner onto current origin/main after PR #1603 merged at b091935743e4494ad615b683e6ee3fbce3c2e248. The rebase applied cleanly and preserves the runner pooling/rerun-failed behavior on top of the merged grading artifact contract.\n\nNew head: a25016e466e668cb5ac886abab815d7216dc862a.\n\nLocal validation rerun:\n- bun test packages/core/test/evaluation/orchestrator.test.ts -t 'pooled workspace'\n- bun run --cwd packages/core typecheck\n- bun run --cwd packages/core lint\n- bun test packages/core/test/evaluation/orchestrator.test.ts\n- bun run --cwd packages/core build\n- bun run --cwd apps/cli typecheck\n- bun run --cwd apps/cli lint\n- bun run --cwd apps/cli build\n- bun test apps/cli/test/eval.integration.test.ts\n- bun test apps/cli/test/commands/results/serve.test.ts\n\nFresh PR CI is green: https://github.com/EntityProcess/agentv/actions/runs/28606135105. PR remains draft/open for independent re-review.

christso · 2026-07-02T16:42:17Z

Final review for head a25016e466e668cb5ac886abab815d7216dc862a after the post-#1603 rebase: clean.

Confirmed the prior pooled workspace reset blocker is resolved: reset failures now quarantine the slot, wake waiters only when no clean slots remain, and return a setup-stage execution_error with failureReasonCode=workspace_pool_unavailable instead of reusing a dirty slot or falling through to no/fresh workspace behavior. The regression coverage covers both single-slot reset failure and multi-slot pool exhaustion.

Checked the #1603 integration surface around projection-aware result identities / artifact replacement and did not find a new blocker. Fresh PR CI is green for this head: https://github.com/EntityProcess/agentv/actions/runs/28606135105.

Verdict: no blocking findings. Orchestrator may proceed to ready/merge sequencing.

christso mentioned this pull request Jul 2, 2026

feat(eval): add rerun-failed runner pooling #1604

Closed

christso force-pushed the feat/av-kfik-10-runner branch from 6af012a to 17cf44c Compare July 2, 2026 16:23

christso added 3 commits July 2, 2026 18:32

feat(eval): add rerun-failed runner pooling

6589dc3

fix(eval): constrain rerun-failed identities

71c1530

fix(eval): quarantine failed workspace pool slots

a25016e

christso force-pushed the feat/av-kfik-10-runner branch from 17cf44c to a25016e Compare July 2, 2026 16:36

christso marked this pull request as ready for review July 2, 2026 16:44

christso merged commit 9ce3447 into main Jul 2, 2026
8 checks passed

christso deleted the feat/av-kfik-10-runner branch July 2, 2026 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(eval): add rerun-failed runner pooling#1609

feat(eval): add rerun-failed runner pooling#1609
christso merged 3 commits into
mainfrom
feat/av-kfik-10-runner

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

christso commented Jul 2, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jul 2, 2026 •

edited

Loading