Skip to content

feat(eval): add lifecycle extensions and agent rules#1605

Closed
christso wants to merge 2 commits into
feat/av-kfik-4-templatingfrom
feat/av-kfik-14-extensions
Closed

feat(eval): add lifecycle extensions and agent rules#1605
christso wants to merge 2 commits into
feat/av-kfik-4-templatingfrom
feat/av-kfik-14-extensions

Conversation

@christso

@christso christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Adds promptfoo-compatible AgentV lifecycle extensions: beforeAll, beforeEach, afterEach, and afterAll.
  • Adds file://...:<hook> extension execution and the built-in agentv:agent-rules staging extension.
  • Keeps workspace.repos first-class: repos are materialized before extension hooks, and repo acquisition is not moved into extension semantics.
  • Exposes staged agent_rules_paths through provider context, result metadata, prepare manifests, generated schema, docs, and examples.

Validation

  • bun test packages/core/test/evaluation/extensions.test.ts packages/core/test/evaluation/validation/eval-file-schema.test.ts packages/core/test/evaluation/validation/eval-validator.test.ts
  • bun test packages/core/test/evaluation/extensions.test.ts apps/cli/test/commands/prepare/prepare.test.ts
  • bun run --cwd packages/core typecheck
  • bun run --cwd packages/core lint
  • bun run --cwd packages/core build
  • bun run lint
  • bun run typecheck
  • bun run --cwd apps/cli build
  • Live dogfood with local OpenAI-compatible target and live LLM grader via http://127.0.0.1:10531/v1: PASS, 1/1, mean score 100%.

Private evidence:

  • agentv-private branch: evidence/av-kfik-14-extensions
  • evidence commit: 36424ef

Scope Notes

  • Legacy workspace.hooks command execution remains available for existing suites and reset policy; new docs and migrated examples route executable setup through top-level extensions.
  • Hard removal of remaining legacy command hooks is left out of this slice because examples and tests still exercise existing behavior.

Post-Deploy Monitoring & Validation

  • Watch CI for eval schema generation, core/CLI typecheck, and CLI prepare tests.
  • For early adopters, inspect run index.jsonl metadata for agent_rules_paths and provider request metadata in failed evals involving extensions.
  • Healthy signal: evals using workspace.repos plus agentv:agent-rules show staged paths under .agentv/agent-rules/** after workspace materialization.
  • Failure signal: setup failures mentioning agentv:agent-rules requires a materialized workspace, stale prepared agent_rules_paths, or missing file:// export functions.
  • Rollback trigger: extension-enabled evals fail before provider invocation despite valid workspace materialization.
  • Validation window: first CI run and first manual extension-enabled eval after merge; owner: AgentV maintainers.

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jul 2, 2026

Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: b7af94a
Status: ✅  Deploy successful!
Preview URL: https://6a236599.agentv.pages.dev
Branch Preview URL: https://feat-av-kfik-14-extensions.agentv.pages.dev

View logs

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Manual CI was dispatched because this stacked PR targets feat/av-kfik-4-templating and pull_request CI only runs for base main. Green run: https://github.com/EntityProcess/agentv/actions/runs/28590245378

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Review findings for PR #1605:

  1. P1 - Conversation-mode cases skip extension teardown and result metadata. runEvalCase returns conversationResult immediately for conversation tests at packages/core/src/evaluation/orchestrator.ts:1919 / packages/core/src/evaluation/orchestrator.ts:1953, before the normal runAfterEachHooks() path at packages/core/src/evaluation/orchestrator.ts:2122 and before the result metadata/output merge at packages/core/src/evaluation/orchestrator.ts:2287. runConversationMode() itself returns no metadata at packages/core/src/evaluation/orchestrator.ts:3386. As a result, evals using mode: conversation with agentv:agent-rules pass agent_rules_paths to provider calls, but the result metadata does not carry them, and afterEach extensions never run. This contradicts the new contract that lifecycle hooks and agent_rules_paths work through provider context and result metadata.

  2. P1 - Multi-slot pooled workspaces do not run beforeAll extensions unless a legacy before_all hook exists, and the state is not slot-scoped. In prepareSharedWorkspaceSetup, the multi-slot pool loop that runs beforeAll extensions is gated by suiteHooksEnabled && hasHookCommand(suiteBeforeAllHook) at packages/core/src/evaluation/workspace/setup.ts:730, so a pooled eval with only extensions: [agentv:agent-rules] and no legacy workspace.hooks.before_all never stages/exposes rules for any slot. If a legacy hook is present, the code merges extension state across every slot at packages/core/src/evaluation/workspace/setup.ts:735, then each case receives the single sharedExtensionState at packages/core/src/evaluation/orchestrator.ts:1271, so agent_rules_paths can contain paths for other pool slots instead of the selected slot. This breaks the workspace.repos + pooled-workers path where repos materialize before extension hooks and each worker should see its own staged paths.

  3. P2 - beforeEach extensions that mutate the workspace without returning state are included in file-change baselines. prepareEvalCaseWorkspace only marks beforeEachNeedsFreshBaseline when the returned state object identity changes at packages/core/src/evaluation/workspace/setup.ts:1259 / packages/core/src/evaluation/workspace/setup.ts:1276. A normal promptfoo-style beforeEach hook can perform filesystem setup and return nothing, in which case runExtensionsForHook returns the previous state and AgentV keeps the old baseline. Later captureWorkspaceFileChanges() diffs from that stale baseline, so extension-created setup files can be reported as agent file changes. Legacy workspace.hooks.before_each always refreshes the baseline; lifecycle extensions should do the same whenever a matching beforeEach hook ran.

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Fixed the lifecycle extension review findings in b7af94a6 (fix(eval): complete lifecycle extension teardown) and pushed feat/av-kfik-14-extensions.

What changed:

  • Conversation-mode cases now run the same afterEach lifecycle teardown path before returning and carry extension result metadata/output (agent_rules_paths, beforeAllOutput, beforeEachOutput, afterEachOutput) like the normal path.
  • Multi-slot pooled workspaces now run suite beforeAll extensions for every acquired slot even when no legacy workspace.hooks.before_all command exists. Extension state is stored per pool slot and each case receives only the state for its selected slot.
  • beforeEach extensions now force a fresh workspace baseline whenever a matching hook ran, even if the hook mutates files but returns no state.
  • Added focused regressions for conversation teardown/metadata, pooled slot-scoped agentv:agent-rules, and no-state beforeEach baseline refresh.

Validation:

  • bun test packages/core/test/evaluation/extensions.test.ts
  • bun test packages/core/test/evaluation/extensions.test.ts packages/core/test/evaluation/orchestrator.test.ts packages/core/test/evaluation/workspace/setup.test.ts packages/core/test/evaluation/workspace/file-changes.test.ts packages/core/test/evaluation/suite-level-input.test.ts (150 passing)
  • bun run --cwd packages/core typecheck
  • bun run --cwd packages/core lint
  • bun run --cwd packages/core build
  • bun run lint
  • bun run typecheck
  • Local extension dogfood against http://127.0.0.1:10531/v1 with model gpt-5.3-codex-spark: PASS, 1/1, mean 100%.

Private evidence: agentv-private branch evidence/av-kfik-14-extensions, commit 1c16dc4, folder review-fix-dogfood/.

@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Manual CI for pushed commit b7af94a is green: https://github.com/EntityProcess/agentv/actions/runs/28592955569

@christso christso deleted the branch feat/av-kfik-4-templating July 2, 2026 14:14
@christso christso closed this Jul 2, 2026
@christso

christso commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Recovery update: GitHub would not reopen this PR because the deleted base branch feat/av-kfik-4-templating prevents changing PR state. The branch was restacked onto origin/main and force-pushed with lease to ca0019330e76101e94ea422d4a5fcca2a32f64e8. Replacement draft PR: #1607.

christso added a commit that referenced this pull request Jul 2, 2026
Recover PR #1605 on top of main after its stacked base branch was deleted. Preserves lifecycle extension support, agent-rules staging, and the b7af94a review fixes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant