bench(parity): cg HTTP and cg-mcp share the same 8-verb surface by DvirDukhan · Pull Request #696 · FalkorDB/code-graph

DvirDukhan · 2026-05-29T18:03:51Z

Summary

Pairs with #api-v2 (the /api/v2/* MCP-parity endpoints). With those endpoints in place, the SWE-bench harness can now run the HTTP-transport sibling (cg) on the same verb surface as the stdio-MCP sibling (cg-mcp), so a head-to-head benchmark measures transport overhead rather than API-surface differences.

Changes

bench/agents/code_graph_adapter.py — add v2 client methods on CodeGraphClient that POST to the new /api/v2/* endpoints (search_code, get_callers, get_callees, get_dependencies, impact_analysis, find_path_v2, ask_v2). Existing UI-shaped methods kept for back-compat with tests/test_cli.py.
bench/cli/cg.py — rewrite to expose the 8 MCP-style verbs (index_repo, search_code, get_callers, get_callees, get_dependencies, impact_analysis, find_path, ask) alongside the legacy UI verbs. Mirrors cg_mcp.py's _compact_list / _strip_worktree_prefix helpers so token compaction is byte-identical between transports.
bench/runners/mini_runner.py — INSTANCE_TEMPLATE_CODE_GRAPH now documents the new verb surface. The cg track exports PROJECT_NAME + BRANCH like the MCP track, and indexes via /api/analyze_folder with explicit branch=_default so both tracks share the code:<project>:<branch> graph namespace.
bench/tools/code_graph/system_preamble.md — rewritten to mirror bench/tools/code_graph_mcp/system_preamble.md verb-for-verb.

Validation

Parity verified byte-for-byte on a pre-indexed pytest-6202 graph: cg search_code/get_callers/get_callees/impact_analysis returns identical output to the cg-mcp equivalents (1 KB payload diff'd). All 27 existing bench + CLI tests still pass.

Stacked

Base: dvirdukhan/api-v2-mcp-parity (needs the v2 endpoints).

Pairs with #api-v2 (api/v2/* MCP-parity endpoints). With those endpoints in place, the bench harness can now run the HTTP-transport sibling (cg) on the same verb surface as the stdio-MCP sibling (cg-mcp), so a head-to-head benchmark measures *transport overhead* rather than API-surface differences. Changes: * bench/agents/code_graph_adapter.py — add v2 client methods on CodeGraphClient that POST to the new /api/v2/* endpoints (search_code, get_callers, get_callees, get_dependencies, impact_analysis, find_path_v2, ask_v2). Existing UI-shaped methods (graph_entities, get_neighbors, find_paths, ...) kept for back-compat with tests/test_cli.py. * bench/cli/cg.py — rewrite to expose the 8 MCP-style verbs (index_repo, search_code, get_callers, get_callees, get_dependencies, impact_analysis, find_path, ask) alongside the legacy UI verbs. Mirrors cg_mcp.py's _compact_list / _strip_worktree_prefix helpers so token compaction is byte-identical between transports. * bench/runners/mini_runner.py — INSTANCE_TEMPLATE_CODE_GRAPH now documents the new verb surface. The cg track exports PROJECT_NAME + BRANCH like the MCP track, and indexes via /api/analyze_folder with explicit branch=_default so both tracks share the code:<project>:<branch> graph namespace. * bench/tools/code_graph/system_preamble.md — rewritten to mirror bench/tools/code_graph_mcp/system_preamble.md verb-for-verb. Parity verified byte-for-byte on a pre-indexed pytest-6202 graph: cg search_code/get_callers/get_callees/impact_analysis returns identical output to the cg-mcp equivalents (1 KB payload diff'd). All 27 existing bench + CLI tests still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

coderabbitai · 2026-05-29T18:04:00Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 48e4c33d-4336-4e70-8700-dd23c3eef2cc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dvirdukhan/bench-mcp-parity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Iter3 root-cause: with the verb surfaces and tool outputs now byte-identical between the HTTP (cg) and MCP (cg-mcp) tracks, the remaining token gap traced entirely to reading strategy. On 2/10 instances the agent fell into a 19x full-file `cat` loop instead of reading the bounded span the graph already pointed at, inflating input tokens 3-4x on those instances. Both preambles now explicitly forbid `cat`-ing a whole source file and require `sed -n 'START,ENDp'` anchored on the graph's line number. This attacks the actual token driver and applies equally to both transports so a head-to-head stays apples-to-apples. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

sample_instances() was called with only `stage` (size from STAGE_SIZES), then the result was sliced `[:limit]`. That let --limit shrink the sample below the stage size but never grow it, so `--stage calibration --limit 40` silently ran just 10 instances. Pass n=args.limit straight into sample_instances so the limit sets the exact sample size (falling back to the stage size when unset). Because random.sample is prefix-stable for our seed, the n=10 calibration set stays a subset of the n=40 set, so existing trajectories/indexed graphs still resume-skip cleanly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

DvirDukhan and others added 2 commits May 30, 2026 07:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench(parity): cg HTTP and cg-mcp share the same 8-verb surface#696

bench(parity): cg HTTP and cg-mcp share the same 8-verb surface#696
DvirDukhan wants to merge 3 commits into
dvirdukhan/api-v2-mcp-parityfrom
dvirdukhan/bench-mcp-parity

DvirDukhan commented May 29, 2026

Uh oh!

coderabbitai Bot commented May 29, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DvirDukhan commented May 29, 2026

Summary

Changes

Validation

Stacked

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 29, 2026 •

edited

Loading