docs: plan CLI command surface alignment by christso · Pull Request #1606 · EntityProcess/agentv

christso · 2026-07-02T13:07:39Z

Summary

Adds docs/plans/2026-07-02-cli-command-surface-alignment.md, a research report and staged plan for making AgentV's CLI command surface more intuitive while preserving AgentV's repo-native artifact and product boundaries.

This PR is linked to Bead av-ap2w.

Findings

Promptfoo optimizes around a broad, task-oriented promptfoo eval surface plus view, share, retry, cache, generate, list, and export commands.
Margin evals keeps the CLI small and explicit: margin run --suite --agent-config --eval, with resource-specific init subcommands and bundle-based resume.
DeepEval uses a pytest-first deepeval test run workflow, plus TUI/cloud viewing and many provider setup commands; AgentV should avoid copying that provider-command sprawl.
Vercel agent-eval has the shallowest happy path: init, default/no-arg run, single experiment shorthand, run-all, playground, fingerprint reuse, and --force/--dry/--smoke controls.
AgentV already has the right primitives, but the post-run surface is split across dashboard, results, inspect, compare, trend, and runs. The plan recommends reframing the common path first, then consolidating overlapping post-run inspection commands around results and dashboard.

Source SHAs and DeepWiki Usage

Local clones inspected:

Promptfoo: /home/entity/projects/promptfoo/promptfoo at 6bfc5a0c7f16f9c4717ac731d276b578e63d0769
Margin evals: /home/entity/projects/Margin-Lab/evals at 53fb2fd080689efaf7934573d8759d14fc1043e4
DeepEval: /home/entity/projects/confident-ai/deepeval at 324355e8982bf9ee52c192215b06b4267aafa58e
Vercel agent-eval: /home/entity/projects/vercel-labs/agent-eval at a9dcc9a8c53dbc22ececc967ded7ab3963f18e67 (main...origin/main [behind 8])

DeepWiki MCP was used for repo-level command-surface orientation across the four peer repos and for a focused Vercel agent-eval question. Local source was treated as authoritative; the report notes that DeepWiki's Vercel summary was stale relative to the local clone.

Scope

No command implementation is included. This is a research/plan PR only.

Validation

Ran git diff --check
Inspected peer repositories with git, rg, and file reads only
Did not run peer installs, builds, tests, or evals

cloudflare-workers-and-pages · 2026-07-02T13:08:10Z

Deploying agentv with Cloudflare Pages

Latest commit:	`478d3c8`
Status:	✅ Deploy successful!
Preview URL:	https://cdd0eaae.agentv.pages.dev
Branch Preview URL:	https://research-av-ap2w-cli-command.agentv.pages.dev

View logs

christso · 2026-07-02T13:45:11Z

Review finding:

[P1] Refresh stale peer-source conclusions before merging this as research source of truth. The report says the Vercel agent-eval baseline is a9dcc9a8... and already notes it was behind, then uses that snapshot to conclude that Vercel's happy path is default/no-arg run, single-experiment shorthand, and explicit run-all (docs/plans/2026-07-02-cli-command-surface-alignment.md lines 33, 55, 60, 85, 88, 95). After git fetch origin in /home/entity/projects/vercel-labs/agent-eval, current origin/main is 6ebfe82 and includes b35873c [agent-eval] Incremental eval staleness: status / run + content-aware fingerprints; packages/agent-eval/src/cli.ts now replaces the run-all command with explicit run, adds status and refingerprint, and changes bare agent-eval to show status and let TTY users pick experiments rather than auto-running everything. README also documents agent-eval status, agent-eval run <experiment...>, content fingerprints, and refingerprint. That directly invalidates the Vercel taxonomy and parts of the AgentV takeaway around shallow happy path, dry/smoke/preflight, and hidden fingerprint reuse. I also found that fetching the other peer clones makes the source-status table stale: Promptfoo is now main...origin/main [behind 213] and DeepEval is main...origin/main [behind 126]; DeepEval's current CLI adds gate and test run --official. Before this plan is mergeable as a research artifact, please either refresh the peer research against fetched/current refs (or official docs where public contract matters) or clearly reframe every peer conclusion as a pinned historical snapshot that follow-up implementation must not treat as current source of truth.

christso · 2026-07-02T14:05:07Z

Addressed the review blocker in 964a2113.

What changed:

refreshed the peer-source snapshot rows to fetched origin/main SHAs for Promptfoo, DeepEval, and Vercel agent-eval
corrected the Vercel command-surface notes to current run, status, refingerprint, playground, and TTY status/pick behavior
updated the recommendation so compare and trend are canonical under agentv results compare / agentv results trend
removed agentv compare / agentv trend from the stable target top-level table; any existing top-level forms are documented only as hidden/deprecated compatibility aliases

Validation: git diff --check for the doc-only plan update.

christso · 2026-07-02T14:14:52Z

Verification blocker remains after re-fetching peer sources.

I refreshed the peer clones during verification. Promptfoo, Margin evals, and DeepEval match the SHAs recorded in the plan, but /home/entity/projects/vercel-labs/agent-eval now has origin/main at 1d1f4f60b290d1ca3fac00afe07f3db231f00afd, while the plan still records Vercel as 6ebfe82f39dddb9614add9bebf14a843658ef058 (origin/main). The delta from 6ebfe82 to current origin/main only touches release/changelog files (packages/agent-eval/package.json, packages/agent-eval/CHANGELOG.md, changesets), and the current origin/main:packages/agent-eval/src/cli.ts still supports the corrected run, status, refingerprint, playground, and bare TTY status/pick behavior. But the acceptance criterion asked for peer-source snapshots refreshed against fetched local origin/main SHAs, so I cannot strictly verify that point while the table labels the older Vercel SHA as origin/main.

Other validation passed:

git diff --check origin/main...HEAD is clean.
PR CI check rollup is green.
compare / trend are canonical as agentv results compare / agentv results trend in the plan.
top-level compare / trend are absent from the stable target table and are described only as hidden/deprecated compatibility aliases.

Leaving the PR draft and unmerged until the Vercel snapshot row is refreshed to current fetched origin/main or explicitly reframed as a pinned source snapshot.

christso · 2026-07-02T14:42:59Z

Updated the Vercel snapshot row in 6a22715c so the plan now records fetched /home/entity/projects/vercel-labs/agent-eval origin/main as 1d1f4f60b290d1ca3fac00afe07f3db231f00afd, with a note that the post-6ebfe82 delta only touched release/changelog/package metadata and not the inspected command files.

Validation for the original blocker:

git diff --check passed.
Vercel source consistency passed: origin/main:packages/agent-eval/src/cli.ts still exposes playground, run, status, refingerprint, and the bare TTY status/pick behavior.
compare / trend remain canonical as agentv results compare / agentv results trend in the plan.
stable top-level agentv compare / agentv trend references are absent.

Remaining merge blocker: CI is not green. The Test job failed twice on the PR merge ref, including one rerun. The repeated failure is:

agentv prepare > remaps prepared extension context paths into the output workspace

The logged command failed because the generated /tmp/.../.agentv/targets.yaml uses the removed target name field and is missing required label:

[targets[0].label] Missing or invalid 'label' field (must be a non-empty string)
[targets[0].name] The target 'name' field has been removed. Use 'label' for the AgentV target name.

All other checks passed after the rerun. Leaving the PR draft and unmerged until the test failure is resolved or CI is otherwise green.

christso added 3 commits July 2, 2026 16:52

docs: plan CLI command surface alignment

e3e2cc8

docs(cli): move compare and trend under results

50acc32

docs(cli): refresh Vercel agent-eval snapshot

478d3c8

christso force-pushed the research/av-ap2w-cli-command-surface branch from 6a22715 to 478d3c8 Compare July 2, 2026 14:54

christso marked this pull request as ready for review July 2, 2026 14:57

christso merged commit 10e3472 into main Jul 2, 2026
8 checks passed

christso deleted the research/av-ap2w-cli-command-surface branch July 2, 2026 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: plan CLI command surface alignment#1606

docs: plan CLI command surface alignment#1606
christso merged 3 commits into
mainfrom
research/av-ap2w-cli-command-surface

christso commented Jul 2, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026 •

edited

Loading

Uh oh!

christso commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

christso commented Jul 2, 2026

Summary

Findings

Source SHAs and DeepWiki Usage

Scope

Validation

Uh oh!

cloudflare-workers-and-pages Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying agentv with Cloudflare Pages

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026

Uh oh!

christso commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christso commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented Jul 2, 2026 •

edited

Loading

christso commented Jul 2, 2026 •

edited

Loading