feat(core): load file-backed datasets#1601
Conversation
Deploying agentv with
|
| Latest commit: |
2d66a22
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://4bc932b0.agentv.pages.dev |
| Branch Preview URL: | https://feat-av-kfik-9-datasets.agentv.pages.dev |
|
CI follow-up pushed in 079e808. The Test job failed because the GitHub runner did not have |
Independent review: PR #1601Findings:
I reproduced this with: input: Answer about {{ topic }}
tests: file://cases.csvid,topic,__expected
case,refund,contains:refund
The new CSV parser advertises promptfoo-compatible A parser probe using: id,input,__expected,__expected2,__expected3,__expected4
case,hello,similar:hello,latency(1000),cost(0.01),file://grader.pyloaded assertions only for Verification:
Residual risk: live provider dogfood was not run for this review; the findings above are parser/validation contract issues reproduced through local parser probes. |
|
Addressed the two review blockers in commit 1067596 (fix(core): align CSV dataset validation with runtime).\n\nValidation run locally:\n- bun test packages/core/test/evaluation/loaders/case-file-loader.test.ts packages/core/test/evaluation/validation/eval-validator.test.ts — 129 pass, 0 fail\n- bun --filter @agentv/core typecheck\n- bun --filter @agentv/core build\n- bun run lint\n- bun run validate:examples — 62 valid, 0 invalid\n- bun --filter @agentv/core test — 2128 pass, 0 fail\n\nNotes:\n- CSV rows with vars plus parent suite input now load as runnable tests.\n- CSV __expected latency/cost/file://.py forms now map to runnable AgentV assertions; unsupported forms such as similar: now fail validation/load clearly instead of drifting to runtime skips/failures.\n |
1067596 to
aacf627
Compare
|
Rebased and pushed the review-blocker fix in Addressed:
Validation on the rebased branch:
|
aacf627 to
2d66a22
Compare
|
Updated the pushed fix to Additional CI blocker fix:
Validation on the updated branch:
|
Summary
AgentV eval suites can now load raw tests from
file://CSV, JSON, JSONL, YAML, JavaScript, and Python dataset sources without replacingimports.testsorselectcomposition. Promptfoo-style CSV rows now become AgentV cases: expected columns create assertions, provider output remains first-classexpected_output, metadata/config columns map onto case metadata and grader thresholds, and__metricsurvives as assertion names in result scores.The implementation keeps dataset script execution explicit and bounded: JavaScript is loaded by file URL, Python runs through
uv run python, and both paths must return JSON arrays of case objects.Validation
bun test packages/core/test/evaluation/loaders/case-file-loader.test.ts packages/core/test/evaluation/validation/eval-validator.test.tsbun run --cwd packages/core typecheckbun run --cwd packages/core lintbun run --cwd packages/core buildbun run lintbun run typecheckbun apps/cli/src/cli.ts validate examples/features/external-datasets/evals/dataset.eval.yamlLive provider dogfood was not run because this slice changes dataset parsing, validation, docs, and examples only; it does not change provider execution, graders, scoring runtime, or run artifact layout.
Related
Related: av-kfik.9