Skip to content

Canonicalize targets around labels#1598

Merged
christso merged 3 commits into
mainfrom
feat/av-kfik-6-targets
Jul 2, 2026
Merged

Canonicalize targets around labels#1598
christso merged 3 commits into
mainfrom
feat/av-kfik-6-targets

Conversation

@christso

@christso christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Canonicalizes AgentV target authoring around label as the AgentV target/comparison name and optional promptfoo id as backend/provider metadata.
  • Parses inline target/targets objects and targets.yaml through the promptfoo-shaped object fields (label, id, config, prompts, transform, delay, env) while preserving AgentV extensions.
  • Rejects live top-level providers aliases and serializes task/eval bundles back to canonical label/config target YAML.
  • Updates SDK types, generated eval schema, target docs, and eval-writer skill guidance.

Validation

  • bun test packages/core/test/evaluation/providers/targets.test.ts packages/core/test/evaluation/providers/targets-file.test.ts packages/core/test/evaluation/validation/targets-validator.test.ts packages/core/test/evaluation/loaders/config-loader.test.ts packages/core/test/evaluation/validation/eval-validator.test.ts packages/core/test/evaluation/validation/eval-file-schema.test.ts packages/sdk/test/eval-authoring.test.ts apps/cli/test/commands/eval/bundle.test.ts apps/cli/test/commands/runs/rerun.test.ts apps/cli/test/eval.integration.test.ts
  • bun --filter @agentv/core generate:schema
  • bun test packages/core/test/evaluation/validation/eval-schema-sync.test.ts
  • bun run typecheck
  • bun run lint
  • bun --filter @agentv/dashboard build
  • Live dogfood: Gemini live target + Gemini live grader using label target names, optional id backend metadata, and nested config; result PASS, 1/1, mean 50%.

Private evidence: agentv-private:evidence/av-kfik-6-targets commit 77085b9.

Notes

OpenAI live dogfood reached the provider path but the copied OPENAI_API_KEY is a dummy, so Gemini is the successful live provider/grader evidence. Broad migration/codemod work remains with av-kfik.15.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jul 2, 2026

Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 60e8c7f
Status: ✅  Deploy successful!
Preview URL: https://4eb31d41.agentv.pages.dev
Branch Preview URL: https://feat-av-kfik-6-targets.agentv.pages.dev

View logs

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Follow-up for CI + local OpenAI-compatible dogfood on b126b1bb:

  • CI root cause: the prepared-attempt CLI test fixtures still authored .agentv/targets.yaml entries with removed name: codex. The branch correctly validates target identity as Promptfoo-style label, so CI failed before the prepare/grade assertions could run.
  • Fix: updated the two stale fixtures in apps/cli/test/commands/prepare/prepare.test.ts and apps/cli/test/commands/grade/grade-prepared.test.ts from name: codex to label: codex. No runtime code changed.
  • Local validation:
    • bun test apps/cli/test/commands/prepare/prepare.test.ts apps/cli/test/commands/grade/grade-prepared.test.ts -> 9 pass, 0 fail.
    • bun --filter agentv test -> 745 pass, 0 fail.
    • bun run test -> core 2124 pass, sdk 92 pass, agentv 745 pass, dashboard 145 pass; all 0 fail.
  • Supplemental local OpenAI-compatible dogfood:
    • Endpoint: http://127.0.0.1:10531/v1 with LOCAL_OPENAI_PROXY_API_KEY=dummy.
    • Model used: gpt-5.3-codex-spark. /v1/models advertised this model.
    • Command: AGENTV_NO_UPDATE_CHECK=1 LOCAL_OPENAI_PROXY_BASE_URL=http://127.0.0.1:10531/v1 LOCAL_OPENAI_PROXY_API_KEY=dummy LOCAL_OPENAI_PROXY_MODEL=gpt-5.3-codex-spark bun --no-env-file apps/cli/src/cli.ts eval /tmp/av-kfik-6-local-openai-20260702102235/evals/local-openai-label-target.eval.yaml --targets /tmp/av-kfik-6-local-openai-20260702102235/.agentv/targets.yaml --target local-openai-target --output .agentv/results/av-kfik-6-local-openai-20260702102235 --threshold 0 --no-cache.
    • Result: PASS, 1/1, mean 100%; target local-openai-target; LLM grader target local-openai-grader; semantic-local-grader:llm-grader score 1.
    • Evidence: private branch agentv-private:evidence/av-kfik-6-targets-openai, commit 928d122; run bundle run-bundle/av-kfik-6-local-openai-20260702102235/.
  • Alternate model spelling: gpt-codex-5.3-spark was not run as canonical evidence and was not listed by /v1/models; canonical smoke used and passed with gpt-5.3-codex-spark.

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

CI follow-up: all checks are now green on . The formerly failing \ job passed in 1m54s: https://github.com/EntityProcess/agentv/actions/runs/28583074773/job/84747840855

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

CI follow-up: all checks are now green on b126b1bb. The formerly failing Test job passed in 1m54s:

https://github.com/EntityProcess/agentv/actions/runs/28583074773/job/84747840855

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Review findings for PR #1598:

  1. P1 - Runtime loading still accepts top-level providers instead of rejecting it.

packages/core/src/evaluation/yaml-parser.ts:1490 rejects other removed runtime containers (execution, model, runs, etc.), but it does not check suite.providers. The schema and agentv validate path reject providers, but the runtime loader used by eval execution/bundling can still parse a YAML file with top-level providers and silently ignore that field. I verified this directly with loadTestSuite() on an eval containing only providers: plus tests; it returned successfully with one test. That violates the Bead contract that targets is the only live SUT key and providers remapping is conversion/codemod-only. Please reject providers in the parser/runtime path as well, with a loader/eval-run regression test rather than only a validator test.

  1. P2 - agentv eval bundle drops inline targets object definitions.

packages/core/src/evaluation/loaders/config-loader.ts:407 returns EvalTargetRef.definition for object entries under top-level targets, but apps/cli/src/commands/eval/commands/bundle.ts:93 only adds synthetic definitions for ref.use_target. For a canonical inline target such as targets: [{ label: candidate, provider: mock, config: ... }], suite.targets includes candidate, but the bundle command builds its definitions only from .agentv/targets.yaml, then ensureTargetGraph() reports Target 'candidate' not found or omits the inline definition. This breaks the new live targets authoring surface for portable bundles. Please mirror the selectMultipleTargets() behavior by adding ref.definition before the use_target shim, and cover it with an eval bundle test.

CI and the recorded live dogfood evidence look good, but I would not merge until these contract gaps are fixed.

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed both review blockers in commit 2326e7d.

Changes:

  • Runtime eval suite loading now rejects top-level providers with the same targets-contract guidance as validation.
  • agentv eval bundle now preserves inline eval target object definitions in the target graph before falling back to use_target synthetic refs.

Validation run locally:

  • cd packages/core && bun test test/evaluation/eval-inline-experiment.test.ts
  • bun --filter @agentv/core build && bun --filter @agentv/sdk build
  • cd apps/cli && bun test test/commands/eval/bundle.test.ts
  • bun --filter @agentv/core typecheck
  • bun --filter @agentv/core lint
  • bun --filter agentv lint
  • cd apps/cli && bun run typecheck
  • cd apps/cli && bun run build
  • git diff --check

Current PR state after push: mergeable, still draft, Cloudflare Pages check in progress.

@christso christso force-pushed the feat/av-kfik-6-targets branch from 2326e7d to 60e8c7f Compare July 2, 2026 14:26
@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Follow-up after CI tested the synthetic merge with current main:

  • Rebased feat/av-kfik-6-targets onto origin/main and force-pushed with lease.
  • Current head: 60e8c7f.
  • Fixed the newly merged interpolation test fixture to use canonical eval-local target label instead of removed name.

Additional local validation after rebase:

  • bun install after the new main dependency change.
  • cd packages/core && bun test test/evaluation/eval-inline-experiment.test.ts test/evaluation/interpolation-integration.test.ts test/runtime/exec.test.ts
  • bun --filter @agentv/core build && bun --filter @agentv/sdk build && cd apps/cli && bun test test/commands/eval/bundle.test.ts
  • bun --filter @agentv/core typecheck
  • bun --filter @agentv/core lint
  • bun --filter agentv lint
  • cd apps/cli && bun run typecheck
  • cd apps/cli && bun run build
  • git diff --check

CI is running again on the rebased head. I will not mark ready or merge unless it goes green.

@christso christso marked this pull request as ready for review July 2, 2026 14:28
@christso christso merged commit 8e544be into main Jul 2, 2026
8 checks passed
@christso christso deleted the feat/av-kfik-6-targets branch July 2, 2026 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant