feat(evals): add prompt instance expansion#1602
Conversation
Deploying agentv with
|
| Latest commit: |
5f041fc
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://7365612f.agentv.pages.dev |
| Branch Preview URL: | https://feat-av-kfik-5-instance-expa.agentv.pages.dev |
|
CI follow-up pushed as 9cefd81. The Test job failed on eval-schema-sync because the generated schema source did not explicitly include prompt-object fields (prompt/file/messages) that the checked-in schema reference had. I added those fields to the Zod schema with lightweight object-array shapes and regenerated eval.schema.json. Local verification: bun test packages/core/test/evaluation/validation/eval-schema-sync.test.ts; bun run --cwd packages/core typecheck; bun run lint; bun --filter @agentv/core test (2118 pass, 0 fail). |
Code Review FindingsP1 - Function prompt sources are still rejected
Suggested fix: either implement the function/function-file prompt source path and add loader coverage, or explicitly revise the Bead/docs/schema contract before merging. Given the Bead is the source of truth for this PR, I would treat this as blocking. P1 -
|
|
Addressed the PR #1602 review blockers in commit e859e3f ( Validation run:
Private evidence: |
e859e3f to
5f041fc
Compare
|
Rebased the PR on current Addressed the four blocker findings from the review comment:
Validation run locally on the rebased head:
Live dogfood also passed against the local OpenAI proxy with function prompt expansion and target-id lookup. Private evidence is in CI for new head |
Summary
Eval YAML can now use top-level
promptsas the authored input matrix and combine it withtargets,tests, andrepeat.countinto deterministic execution identity. Result artifacts now expose prompt identity plussample_indexandretry_index, so repeated samples and future worker-pool reruns do not have to infer pass@k inputs fromrun-Npaths.This keeps the current runner shape intact while making the new contract explicit: legacy
inputstill loads only as a warned compatibility path, and mixingtests[].inputwith top-levelpromptsnow fails with migration guidance.Design Notes
test_idplusprompt_id/prompt_labelfor comparison.targetsnow accept promptfoo-styleidandlabel, mapped through the existing target selector until av-kfik.6 completes the deeper target-provider locator work.repeat.countnow flows into trialsample_index; provider retry remains separate asretry_index.Related: av-kfik.5
Validation
bun test packages/core/test/evaluation/eval-inline-experiment.test.ts apps/cli/test/commands/eval/artifact-writer.test.tsbun --filter @agentv/core buildbun run typecheckbun run linthttp://127.0.0.1:10531/v1withgpt-5.3-codex-spark, two prompts,repeat.count: 2, and a live LLM grader: PASS, 2/2 rows scored 100%, each row carriedprompt_idand trials withsample_index0 and 1 plusretry_index0. The temporary target usedapi_key: ${{ LOCAL_OPENAI_PROXY_API_KEY }}withLOCAL_OPENAI_PROXY_API_KEY=dummy-local-key; no real API key or copied.envwas required.Evidence
Private evidence branch:
EntityProcess/agentv-private:evidence/av-kfik-5-instance-expansionat179ece2.