Skip to content

Make code-exec-harness the Dogfood Parity 1 exec gate #405

@shiny-code-bot

Description

@shiny-code-bot

Finish Line

tools/code-exec-harness is the first deterministic no-live-token gate for Dogfood Parity 1, running against the dev-fast code binary.

Current Status

State: First gate landed in PR #406.

Merged:

  • just harness-smoke now runs the deterministic no-live-token fake /v1/responses smoke suite.
  • The runner targets code-rs/target/dev-fast/code by default and gives a clear error if the binary is missing.
  • exec-basic-smoke.json proves basic code exec --json startup, final assistant message emission, event count shape, request capture, and exit 0.
  • The harness now injects current Codex-base fake-server config with -c openai_base_url=..., skips the removed legacy --max-seconds flag when unsupported, stages scenario-local .code/skills fixtures into isolated CODE_HOME/skills, stores raw events in summary artifacts, and quotes fake gh shim paths.

Validation:

  • ./build-fast.sh passed cleanly before merge.
  • just harness-smoke passed all six deterministic scenarios before merge.
  • GitHub blob-size policy passed on PR Make exec harness a dogfood gate #406.
  • Focused review agents inspected the branch. Findings fixed before merge: event-count schema expected item.agent_message, not legacy msg.agent_message; fake gh shim path needed shell quoting.

Remaining parity questions discovered by this gate:

  • The old config-disabled manual-skill explicit invocation scenario no longer matches current Codex-base behavior and is excluded from the first smoke gate pending a product decision.
  • The old context-ledger stderr marker assertions no longer exist in current code and were replaced by request-shape assertions.
  • The old image replay omission text changed; the current gate asserts no raw data:image/ replay plus generated-image path carry-forward.

Acceptance Criteria

  • Add a deterministic runner script for all fake /v1/responses scenarios.
  • The runner uses code-rs/target/dev-fast/code by default and errors with a clear message when the binary is missing.
  • Add a basic exec smoke scenario that expects code exec --json to return a final assistant message and exit 0.
  • Update the harness README and local AGENTS.md so future agents know this is the Dogfood Parity 1 P0 gate.
  • Update repo workflow metadata to include the harness command as a quality gate.
  • Validate with ./build-fast.sh and the deterministic harness command.

Out Of Scope

  • Live-model GitHub planning smoke in CI or release gates.
  • Auto Drive, Code Bridge/browser, Auto Review, and multi-agent feature restoration.
  • Large direct code copies from the pre-pivot Every Code branch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    planDurable planning issueplan:donePlan completed or superseded

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions