Rebuild codeanalyzer-backend on schema v2 (additive CPG)#9
Open
rahlk wants to merge 5 commits into
Open
Conversation
Rewrite canonical-schema.md as the v2 keystone — the single source of truth:
one scale-free node + single-parent containment tree + typed edge overlays
(a CPG), emitted as two projections (analysis.json + Neo4j). Named-map
hierarchy (application -> symbol_table{module} -> types/functions ->
callables -> body); can:// path ids with an app segment; the additive level
model (L1 tree->callable, L2 call_graph, L3 intraprocedural cfg/cdg/ddg
syntactic, L4 interprocedural param vertices + param_in/out/summary + ddg
semantic); split per-type edge lists placed at their LCA; source once per
module with byte-offset spans (all text slices off it); monotonic
L1<=..<=L4 with the syntactic/semantic DDG split visible in prov; no-null;
max_level authoritative. Recast schema-reference.md as the per-kind field +
edge appendix and the language-expansion rubric.
Rewrite the backend SKILL.md spine around schema v2: two entry paths (new language vs existing-analyzer major-release migration); JSON + Neo4j as co-primary projections; the workflow re-anchored as grow-the-tree-level-by- level (L1 symbol table -> L2 call graph -> L3 intraprocedural cfg/cdg/ddg -> L4 interprocedural SDG), each additive and independently gated; the CLAUDE.md agent guide must describe the schema model; hand-off to the frontend as a separate SDK major release. New references/schema-migration.md: field-by-field old->v2 mapping (flat symbol_table -> tree, rich edges -> identity, per-callable code -> module source slicing, is_* booleans -> kind, (sig,node) -> can:// ids, Java no-heap|full -> ddg ssa|points-to), level-by-level order, release coordination.
L1 now builds the named-map tree to callable depth with can:// ids, spans with byte offsets, and module-level source; call sites are recorded as call nodes in body with callee:null (so get_call_sites stays an L1 accessor), resolved at L2. Update the keystone level table and schema-reference body kinds to match (call = L1, statements/entry/exit = L3).
Re-anchor the remaining backend refs on the keystone: backend-recipe steps 2/5/6
(v2 tree, module source, call nodes with callee:null, call_graph list);
testing-and-validation (Application vocab, plus new monotonicity superset gate
and two-tier identity gate); dataflow-graphs emission note superseding the
standalone program_graphs shape (dataflow now lives in the tree body{} + split
edge lists); neo4j-projection reframed as co-primary with containment-as-HAS_*
edges and v2 node kinds; schema-design-loop anchored on the keystone (confirm
language expansion, not re-design); analyzer-architecture lists neo4j/ as a
non-optional subpackage.
Add a superseding note pointing the frontend SDK contract at the backend v2 keystone; mapping the Pydantic models to v2 while keeping the same public API is a separate major SDK release (backend hand-off point c) and the next rebuild of the frontend skill.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ground-up rebuild of the
codeanalyzer-backendskill around schema v2 — the additivenode-tree-plus-typed-edges model (a CPG) worked out in design discussion. One structure, two
projections (
analysis.json+ Neo4j), grown level by level.The keystone —
canonical-schema.md(rewritten)application → symbol_table{module} → types/functions → callables → body.can://path ids with an app segment (disambiguates apps in one language); durable ids ≥ callable,…@line:col/…@tagordinals below.callnodes,get_call_sitesstays L1) · L2call_graph(callable→callable,calleebackfill) · L3 intraproceduralcfg/cdg/ddg(syntactic) · L4 interprocedural param vertices +param_in/param_out/summary+ddg(semantic).L1 ⊆ L2 ⊆ L3 ⊆ L4, monotone.sourceonce per module, spans carry byte offsets, all text (incl.get_method_body) slices off it.HAS_*/DECLARESedges.The skill spine —
SKILL.md(rebuilt)CLAUDE.md/AGENTS.mdagent guide must describe the schema model for maintainability.New + reconciled references
schema-migration.md— field-by-field old→v2 (flatsymbol_table→tree, rich edges→identity, per-callablecode→modulesourceslicing,is_*booleans→kind,(sig,node)→can://ids, Javano-heap|full→ddgssa|points-to), level-by-level order, release coordination.schema-reference.mdas the per-kind field + edge appendix.symbol-table-construction,backend-recipe,testing-and-validation(+ new monotonicity superset gate and two-tier identity gate),dataflow-graphs(emission superseded — dataflow lives in the tree),neo4j-projection(co-primary),schema-design-loop,analyzer-architecture(neo4j/non-optional).Scope / next
This is the backend skill (points a, b, d). The frontend SDK v2 migration (point c — remap Pydantic to v2, keep the public API) is flagged in
cldk-sdk-frontend/references/schema-contract.mdand is the next rebuild.Test plan
Docs-only. All reference links resolve; keystone worked example is self-consistent L1→L4.