feat(eval): add rerun-failed runner pooling#1609
Conversation
|
Review finding on PR #1609 (compared against
CI is green for PR #1609 (Build, Typecheck, Lint, Test, Check Links, Validate Marketplace, Validate Evals, Cloudflare Pages). |
6af012a to
17cf44c
Compare
Deploying agentv with
|
| Latest commit: |
a25016e
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://08e152c7.agentv.pages.dev |
| Branch Preview URL: | https://feat-av-kfik-10-runner.agentv.pages.dev |
|
Addressed the pooled workspace reset blocker from #1609 (comment) in 17cf44c.\n\nFix: pooled slots whose post-case reset fails are now quarantined instead of reused. Shared pooled-workspace cases must acquire a clean slot; when all slots are unavailable, later cases return a setup-stage execution_error with failureReasonCode=workspace_pool_unavailable rather than silently running on a dirty slot or falling through to fresh/no workspace behavior.\n\nRegression coverage added in packages/core/test/evaluation/orchestrator.test.ts for:\n- single-slot reset failure: second case does not invoke the provider on the dirty slot\n- multi-slot exhaustion: later case fails with workspace_pool_unavailable after all pooled slot resets fail\n\nValidation after rebasing onto origin/main 74b961c:\n- bun test packages/core/test/evaluation/orchestrator.test.ts -t "pooled workspace"\n- bun run --cwd packages/core typecheck\n- bun run --cwd packages/core lint\n- bun test packages/core/test/evaluation/orchestrator.test.ts\n- bun run --cwd packages/core build\n- bun run --cwd apps/cli typecheck (rerun sequentially after an initial parallel tsup clean race)\n- bun run --cwd apps/cli lint\n- bun run --cwd apps/cli build\n- bun test apps/cli/test/eval.integration.test.ts (clean rerun passed after initial stale-build race)\n- bun test apps/cli/test/commands/results/serve.test.ts (clean rerun passed after initial pre-build race)\n\nDogfood rationale: prior live provider/grader dogfood on this PR still covers the normal runner path. This fix is a deterministic reset-failure safety path; the new tests force reset failure by corrupting the pooled checkout and assert no dirty/no-workspace execution occurs. |
17cf44c to
a25016e
Compare
|
Rebased |
|
Final review for head Confirmed the prior pooled workspace reset blocker is resolved: reset failures now quarantine the slot, wake waiters only when no clean slots remain, and return a setup-stage Checked the #1603 integration surface around projection-aware result identities / artifact replacement and did not find a new blocker. Fresh PR CI is green for this head: https://github.com/EntityProcess/agentv/actions/runs/28606135105. Verdict: no blocking findings. Orchestrator may proceed to ready/merge sequencing. |
Replacement for closed PR #1604. PR #1604 could not be reopened because its deleted stacked base branch
feat/av-kfik-5-instance-expansionprevents reopening/retargeting.Recovery details:
7242c84725f7f4d556f78ba29be2aa04e9a7e2e0ontoorigin/mainat24c93648b4f2352ac5e5979b5dfec4d9d8cbb8c1.5ae72715and6af012a5.recovery/av-lm1m-pre-rebaseat the original head.Validation after rebase:
bun installbun run --cwd packages/core typecheckbun run --cwd packages/core lintbun test packages/core/test/evaluation/orchestrator.test.tsbun run --cwd packages/core buildbun run --cwd apps/cli typecheckbun run --cwd apps/cli lintbun test apps/cli/test/eval.integration.test.tsbun test apps/cli/test/commands/results/serve.test.tsbun run --cwd apps/cli buildbun run --cwd apps/web buildLive dogfood was not rerun for this recovery because the rebase only reconciled identity plumbing with the already-merged prompt matrix work; prior live provider/grader evidence remains
agentv-private:evidence/av-kfik10-runnercommit9a89276.Tracker: Beads
av-lm1m, parentav-kfik.10.