Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion apps/cli/test/commands/prepare/prepare.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ describe('agentv prepare', () => {
path.join(tempDir, '.agentv', 'targets.yaml'),
`
targets:
- name: codex
- label: codex
provider: cli
command: bun ./scripts/target.ts
`,
Expand Down
29 changes: 24 additions & 5 deletions apps/web/src/content/docs/docs/evaluation/eval-files.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@ experiment format.
- A **task suite** is eval YAML that owns task context: `workspace`, shared
`input`, shared `assertions`, fixtures, graders, and test cases. It can run
directly or be imported through `imports.suites`.
- A **raw case file** is a YAML/JSONL array, directory, or glob of cases. Import
it with `imports.tests`, `tests: ./cases.yaml`, or string shorthand; parent
- A **raw case file** is a YAML, JSON, JSONL, CSV, script-backed dataset,
directory, or glob of cases. Import it with `imports.tests`,
`tests: ./cases.yaml`, `tests: file://cases.csv`, or string shorthand; parent
suite context applies because raw cases do not carry their own suite context.
- A **wrapper eval** is eval YAML that imports one or more suites with
`imports.suites` and binds run controls with top-level `target`, `repeat`,
Expand Down Expand Up @@ -373,15 +374,33 @@ tests: ./cases.yaml
```

The path is resolved relative to the eval file's directory. The external raw
case file should contain a YAML array of test objects or a JSONL file with one
test per line. String entries inside a `tests:` list work the same way and may
use direct paths, directories, or globs:
case file can be a YAML or JSON array of test objects, a JSONL file with one
test per line, a promptfoo-compatible CSV file, or an explicit JavaScript or
Python dataset function such as `file://generate-tests.mjs:createTests` or
`file://generate_tests.py:create_tests`. String entries inside a `tests:` list
work the same way and may use direct paths, `file://` paths, directories, or
globs:

```yaml
tests:
- ./cases/*.cases.yaml
```

CSV datasets support promptfoo-style magic columns. `__expected` and
`__expectedN` create AgentV assertions using the supported expected-column
mini-DSL (`contains:*`, `icontains:*`, `contains-any:*`, `contains-all:*`,
`icontains-any:*`, `icontains-all:*`, `starts-with:*`, `ends-with:*`,
`regex:*`, `equals:*`, `is-json`, `latency(<ms>)`, `cost(<usd>)`,
`grade:*`, `llm-rubric:*`, `javascript:*`, `fn:*`, `eval:*`, `python:*`, and
`file://*.py`; file paths inside CSV cells are resolved relative to the CSV
file). Unsupported promptfoo assertion forms such as `similar:*` are rejected
during validation instead of being skipped at runtime.
`__provider_output` becomes first-class `expected_output`, `__metric` names the
generated assertions, `__threshold` sets the test threshold,
`__metadata:<key>` adds metadata, and `__config:__expectedN:threshold` sets an
assertion `min_score`. Ordinary columns become `vars`, so CSV rows can rely on
suite-level `input` that interpolates those variables.

String shorthand is raw-case-only. Import reusable task suites through
`imports.suites`; use `imports.tests` when you want to drop suite context and
import only raw cases into the parent context:
Expand Down
18 changes: 18 additions & 0 deletions examples/features/external-datasets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Demonstrates loading raw test cases from external files using `imports.tests`.

- Loading tests from external YAML files (`imports.tests[].path: cases/accuracy.yaml`)
- Loading tests from external JSONL files (`imports.tests[].path: cases/regression.jsonl`)
- Loading tests from promptfoo-compatible CSV files (`imports.tests[].path: cases/magic.csv`)
- Mixing inline `tests` with imported raw test rows
- Glob patterns for loading multiple files (`imports.tests[].path: cases/**/*.yaml`)

Expand All @@ -21,6 +22,7 @@ bun agentv eval examples/features/external-datasets/evals/dataset.eval.yaml
- `evals/dataset.eval.yaml` — Main eval with inline tests and `imports.tests` references
- `evals/cases/accuracy.yaml` — YAML array of test cases
- `evals/cases/regression.jsonl` — JSONL test data (one test per line)
- `evals/cases/magic.csv` — CSV test data with promptfoo-style magic columns

## Supported Formats

Expand All @@ -42,6 +44,22 @@ One JSON test object per line:
{"id": "test-2", "criteria": "Another outcome", "input": "Another input"}
```

### CSV (.csv)
CSV files use ordinary columns for `id`, `input`, and `vars`, plus promptfoo-style magic columns for assertions and metadata:

```csv
id,input,__expected,__provider_output,__metric,__threshold,__metadata:source,locale
csv-test,Reply with a greeting,icontains:hello,Hello there,greeting,0.8,csv,en-US
```

`__expected` and `__expectedN` become AgentV assertions for the supported CSV
mini-DSL. `latency(<ms>)`, `cost(<usd>)`, and `file://*.py` map to runnable
AgentV graders, with CSV file paths resolved relative to the CSV file;
unsupported promptfoo forms such as `similar:*` are rejected during validation.
`__provider_output` becomes AgentV `expected_output`; ordinary non-magic
columns such as `locale` become `vars` and can be interpolated by suite-level
`input`.

## Glob Patterns

Use glob patterns to load from multiple files:
Expand Down
2 changes: 2 additions & 0 deletions examples/features/external-datasets/evals/cases/magic.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
id,input,__expected,__provider_output,__metric,__threshold,__metadata:source,locale
csv-magic-greeting,Reply with a short greeting,icontains:hello,Hello there,greeting,0.8,csv,en-US
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ imports:
tests:
- path: cases/accuracy.yaml
- path: cases/regression.jsonl
- path: cases/magic.csv

tests:
- id: inline-test
Expand Down
Loading
Loading