EntityProcess · christso · Jul 2, 2026 · Jul 2, 2026 · Jul 2, 2026 · Jul 2, 2026
diff --git a/apps/cli/test/commands/prepare/prepare.test.ts b/apps/cli/test/commands/prepare/prepare.test.ts
@@ -255,7 +255,7 @@ describe('agentv prepare', () => {
       path.join(tempDir, '.agentv', 'targets.yaml'),
       `
 targets:
-  - name: codex
+  - label: codex
     provider: cli
     command: bun ./scripts/target.ts
 `,

diff --git a/apps/web/src/content/docs/docs/evaluation/eval-files.mdx b/apps/web/src/content/docs/docs/evaluation/eval-files.mdx
@@ -20,8 +20,9 @@ experiment format.
 - A **task suite** is eval YAML that owns task context: `workspace`, shared
   `input`, shared `assertions`, fixtures, graders, and test cases. It can run
   directly or be imported through `imports.suites`.
-- A **raw case file** is a YAML/JSONL array, directory, or glob of cases. Import
-  it with `imports.tests`, `tests: ./cases.yaml`, or string shorthand; parent
+- A **raw case file** is a YAML, JSON, JSONL, CSV, script-backed dataset,
+  directory, or glob of cases. Import it with `imports.tests`,
+  `tests: ./cases.yaml`, `tests: file://cases.csv`, or string shorthand; parent
   suite context applies because raw cases do not carry their own suite context.
 - A **wrapper eval** is eval YAML that imports one or more suites with
   `imports.suites` and binds run controls with top-level `target`, `repeat`,
@@ -373,15 +374,33 @@ tests: ./cases.yaml
 ```
 
 The path is resolved relative to the eval file's directory. The external raw
-case file should contain a YAML array of test objects or a JSONL file with one
-test per line. String entries inside a `tests:` list work the same way and may
-use direct paths, directories, or globs:
+case file can be a YAML or JSON array of test objects, a JSONL file with one
+test per line, a promptfoo-compatible CSV file, or an explicit JavaScript or
+Python dataset function such as `file://generate-tests.mjs:createTests` or
+`file://generate_tests.py:create_tests`. String entries inside a `tests:` list
+work the same way and may use direct paths, `file://` paths, directories, or
+globs:
 
 ```yaml
 tests:
   - ./cases/*.cases.yaml
 ```
 
+CSV datasets support promptfoo-style magic columns. `__expected` and
+`__expectedN` create AgentV assertions using the supported expected-column
+mini-DSL (`contains:*`, `icontains:*`, `contains-any:*`, `contains-all:*`,
+`icontains-any:*`, `icontains-all:*`, `starts-with:*`, `ends-with:*`,
+`regex:*`, `equals:*`, `is-json`, `latency(<ms>)`, `cost(<usd>)`,
+`grade:*`, `llm-rubric:*`, `javascript:*`, `fn:*`, `eval:*`, `python:*`, and
+`file://*.py`; file paths inside CSV cells are resolved relative to the CSV
+file). Unsupported promptfoo assertion forms such as `similar:*` are rejected
+during validation instead of being skipped at runtime.
+`__provider_output` becomes first-class `expected_output`, `__metric` names the
+generated assertions, `__threshold` sets the test threshold,
+`__metadata:<key>` adds metadata, and `__config:__expectedN:threshold` sets an
+assertion `min_score`. Ordinary columns become `vars`, so CSV rows can rely on
+suite-level `input` that interpolates those variables.
+
 String shorthand is raw-case-only. Import reusable task suites through
 `imports.suites`; use `imports.tests` when you want to drop suite context and
 import only raw cases into the parent context:

diff --git a/examples/features/external-datasets/README.md b/examples/features/external-datasets/README.md
@@ -6,6 +6,7 @@ Demonstrates loading raw test cases from external files using `imports.tests`.
 
 - Loading tests from external YAML files (`imports.tests[].path: cases/accuracy.yaml`)
 - Loading tests from external JSONL files (`imports.tests[].path: cases/regression.jsonl`)
+- Loading tests from promptfoo-compatible CSV files (`imports.tests[].path: cases/magic.csv`)
 - Mixing inline `tests` with imported raw test rows
 - Glob patterns for loading multiple files (`imports.tests[].path: cases/**/*.yaml`)
 
@@ -21,6 +22,7 @@ bun agentv eval examples/features/external-datasets/evals/dataset.eval.yaml
 - `evals/dataset.eval.yaml` — Main eval with inline tests and `imports.tests` references
 - `evals/cases/accuracy.yaml` — YAML array of test cases
 - `evals/cases/regression.jsonl` — JSONL test data (one test per line)
+- `evals/cases/magic.csv` — CSV test data with promptfoo-style magic columns
 
 ## Supported Formats
 
@@ -42,6 +44,22 @@ One JSON test object per line:
 {"id": "test-2", "criteria": "Another outcome", "input": "Another input"}
 ```
 
+### CSV (.csv)
+CSV files use ordinary columns for `id`, `input`, and `vars`, plus promptfoo-style magic columns for assertions and metadata:
+
+```csv
+id,input,__expected,__provider_output,__metric,__threshold,__metadata:source,locale
+csv-test,Reply with a greeting,icontains:hello,Hello there,greeting,0.8,csv,en-US
+```
+
+`__expected` and `__expectedN` become AgentV assertions for the supported CSV
+mini-DSL. `latency(<ms>)`, `cost(<usd>)`, and `file://*.py` map to runnable
+AgentV graders, with CSV file paths resolved relative to the CSV file;
+unsupported promptfoo forms such as `similar:*` are rejected during validation.
+`__provider_output` becomes AgentV `expected_output`; ordinary non-magic
+columns such as `locale` become `vars` and can be interpolated by suite-level
+`input`.
+
 ## Glob Patterns
 
 Use glob patterns to load from multiple files:

diff --git a/examples/features/external-datasets/evals/cases/magic.csv b/examples/features/external-datasets/evals/cases/magic.csv
@@ -0,0 +1,2 @@
+id,input,__expected,__provider_output,__metric,__threshold,__metadata:source,locale
+csv-magic-greeting,Reply with a short greeting,icontains:hello,Hello there,greeting,0.8,csv,en-US
diff --git a/examples/features/external-datasets/evals/dataset.eval.yaml b/examples/features/external-datasets/evals/dataset.eval.yaml
@@ -7,6 +7,7 @@ imports:
   tests:
     - path: cases/accuracy.yaml
     - path: cases/regression.jsonl
+    - path: cases/magic.csv
 
 tests:
   - id: inline-test
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		id,input,__expected,__provider_output,__metric,__threshold,__metadata:source,locale
		csv-magic-greeting,Reply with a short greeting,icontains:hello,Hello there,greeting,0.8,csv,en-US