Skip to content

docs: plan CLI command surface alignment#1606

Merged
christso merged 3 commits into
mainfrom
research/av-ap2w-cli-command-surface
Jul 2, 2026
Merged

docs: plan CLI command surface alignment#1606
christso merged 3 commits into
mainfrom
research/av-ap2w-cli-command-surface

Conversation

@christso

@christso christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds docs/plans/2026-07-02-cli-command-surface-alignment.md, a research report and staged plan for making AgentV's CLI command surface more intuitive while preserving AgentV's repo-native artifact and product boundaries.

This PR is linked to Bead av-ap2w.

Findings

  • Promptfoo optimizes around a broad, task-oriented promptfoo eval surface plus view, share, retry, cache, generate, list, and export commands.
  • Margin evals keeps the CLI small and explicit: margin run --suite --agent-config --eval, with resource-specific init subcommands and bundle-based resume.
  • DeepEval uses a pytest-first deepeval test run workflow, plus TUI/cloud viewing and many provider setup commands; AgentV should avoid copying that provider-command sprawl.
  • Vercel agent-eval has the shallowest happy path: init, default/no-arg run, single experiment shorthand, run-all, playground, fingerprint reuse, and --force/--dry/--smoke controls.
  • AgentV already has the right primitives, but the post-run surface is split across dashboard, results, inspect, compare, trend, and runs. The plan recommends reframing the common path first, then consolidating overlapping post-run inspection commands around results and dashboard.

Source SHAs and DeepWiki Usage

Local clones inspected:

  • Promptfoo: /home/entity/projects/promptfoo/promptfoo at 6bfc5a0c7f16f9c4717ac731d276b578e63d0769
  • Margin evals: /home/entity/projects/Margin-Lab/evals at 53fb2fd080689efaf7934573d8759d14fc1043e4
  • DeepEval: /home/entity/projects/confident-ai/deepeval at 324355e8982bf9ee52c192215b06b4267aafa58e
  • Vercel agent-eval: /home/entity/projects/vercel-labs/agent-eval at a9dcc9a8c53dbc22ececc967ded7ab3963f18e67 (main...origin/main [behind 8])

DeepWiki MCP was used for repo-level command-surface orientation across the four peer repos and for a focused Vercel agent-eval question. Local source was treated as authoritative; the report notes that DeepWiki's Vercel summary was stale relative to the local clone.

Scope

No command implementation is included. This is a research/plan PR only.

Validation

  • Ran git diff --check
  • Inspected peer repositories with git, rg, and file reads only
  • Did not run peer installs, builds, tests, or evals

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jul 2, 2026

Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: 478d3c8
Status: ✅  Deploy successful!
Preview URL: https://cdd0eaae.agentv.pages.dev
Branch Preview URL: https://research-av-ap2w-cli-command.agentv.pages.dev

View logs

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Review finding:

  • [P1] Refresh stale peer-source conclusions before merging this as research source of truth. The report says the Vercel agent-eval baseline is a9dcc9a8... and already notes it was behind, then uses that snapshot to conclude that Vercel's happy path is default/no-arg run, single-experiment shorthand, and explicit run-all (docs/plans/2026-07-02-cli-command-surface-alignment.md lines 33, 55, 60, 85, 88, 95). After git fetch origin in /home/entity/projects/vercel-labs/agent-eval, current origin/main is 6ebfe82 and includes b35873c [agent-eval] Incremental eval staleness: status / run + content-aware fingerprints; packages/agent-eval/src/cli.ts now replaces the run-all command with explicit run, adds status and refingerprint, and changes bare agent-eval to show status and let TTY users pick experiments rather than auto-running everything. README also documents agent-eval status, agent-eval run <experiment...>, content fingerprints, and refingerprint. That directly invalidates the Vercel taxonomy and parts of the AgentV takeaway around shallow happy path, dry/smoke/preflight, and hidden fingerprint reuse. I also found that fetching the other peer clones makes the source-status table stale: Promptfoo is now main...origin/main [behind 213] and DeepEval is main...origin/main [behind 126]; DeepEval's current CLI adds gate and test run --official. Before this plan is mergeable as a research artifact, please either refresh the peer research against fetched/current refs (or official docs where public contract matters) or clearly reframe every peer conclusion as a pinned historical snapshot that follow-up implementation must not treat as current source of truth.

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the review blocker in 964a2113.

What changed:

  • refreshed the peer-source snapshot rows to fetched origin/main SHAs for Promptfoo, DeepEval, and Vercel agent-eval
  • corrected the Vercel command-surface notes to current run, status, refingerprint, playground, and TTY status/pick behavior
  • updated the recommendation so compare and trend are canonical under agentv results compare / agentv results trend
  • removed agentv compare / agentv trend from the stable target top-level table; any existing top-level forms are documented only as hidden/deprecated compatibility aliases

Validation: git diff --check for the doc-only plan update.

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Verification blocker remains after re-fetching peer sources.

I refreshed the peer clones during verification. Promptfoo, Margin evals, and DeepEval match the SHAs recorded in the plan, but /home/entity/projects/vercel-labs/agent-eval now has origin/main at 1d1f4f60b290d1ca3fac00afe07f3db231f00afd, while the plan still records Vercel as 6ebfe82f39dddb9614add9bebf14a843658ef058 (origin/main). The delta from 6ebfe82 to current origin/main only touches release/changelog files (packages/agent-eval/package.json, packages/agent-eval/CHANGELOG.md, changesets), and the current origin/main:packages/agent-eval/src/cli.ts still supports the corrected run, status, refingerprint, playground, and bare TTY status/pick behavior. But the acceptance criterion asked for peer-source snapshots refreshed against fetched local origin/main SHAs, so I cannot strictly verify that point while the table labels the older Vercel SHA as origin/main.

Other validation passed:

  • git diff --check origin/main...HEAD is clean.
  • PR CI check rollup is green.
  • compare / trend are canonical as agentv results compare / agentv results trend in the plan.
  • top-level compare / trend are absent from the stable target table and are described only as hidden/deprecated compatibility aliases.

Leaving the PR draft and unmerged until the Vercel snapshot row is refreshed to current fetched origin/main or explicitly reframed as a pinned source snapshot.

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Updated the Vercel snapshot row in 6a22715c so the plan now records fetched /home/entity/projects/vercel-labs/agent-eval origin/main as 1d1f4f60b290d1ca3fac00afe07f3db231f00afd, with a note that the post-6ebfe82 delta only touched release/changelog/package metadata and not the inspected command files.

Validation for the original blocker:

  • git diff --check passed.
  • Vercel source consistency passed: origin/main:packages/agent-eval/src/cli.ts still exposes playground, run, status, refingerprint, and the bare TTY status/pick behavior.
  • compare / trend remain canonical as agentv results compare / agentv results trend in the plan.
  • stable top-level agentv compare / agentv trend references are absent.

Remaining merge blocker: CI is not green. The Test job failed twice on the PR merge ref, including one rerun. The repeated failure is:

agentv prepare > remaps prepared extension context paths into the output workspace

The logged command failed because the generated /tmp/.../.agentv/targets.yaml uses the removed target name field and is missing required label:

[targets[0].label] Missing or invalid 'label' field (must be a non-empty string)
[targets[0].name] The target 'name' field has been removed. Use 'label' for the AgentV target name.

All other checks passed after the rerun. Leaving the PR draft and unmerged until the test failure is resolved or CI is otherwise green.

@christso christso force-pushed the research/av-ap2w-cli-command-surface branch from 6a22715 to 478d3c8 Compare July 2, 2026 14:54
@christso christso marked this pull request as ready for review July 2, 2026 14:57
@christso christso merged commit 10e3472 into main Jul 2, 2026
8 checks passed
@christso christso deleted the research/av-ap2w-cli-command-surface branch July 2, 2026 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant