Skip to content

Rebuild codeanalyzer-backend on schema v2 (additive CPG)#9

Open
rahlk wants to merge 5 commits into
mainfrom
feat/schema-v2-rebuild
Open

Rebuild codeanalyzer-backend on schema v2 (additive CPG)#9
rahlk wants to merge 5 commits into
mainfrom
feat/schema-v2-rebuild

Conversation

@rahlk

@rahlk rahlk commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Ground-up rebuild of the codeanalyzer-backend skill around schema v2 — the additive
node-tree-plus-typed-edges model (a CPG) worked out in design discussion. One structure, two
projections (analysis.json + Neo4j), grown level by level.

The keystone — canonical-schema.md (rewritten)

Codeanalyzer is an additive analysis paradigm: each analysis level is the same tree grown one
layer deeper, plus one edge family over the new layer.

  • One scale-free node + single-parent containment tree + typed edge overlays = a CPG; every "section" is a projection.
  • Named-map hierarchy: application → symbol_table{module} → types/functions → callables → body.
  • can:// path ids with an app segment (disambiguates apps in one language); durable ids ≥ callable, …@line:col / …@tag ordinals below.
  • Additive levels: L1 tree→callable (+ call nodes, get_call_sites stays L1) · L2 call_graph (callable→callable, callee backfill) · L3 intraprocedural cfg/cdg/ddg(syntactic) · L4 interprocedural param vertices + param_in/param_out/summary + ddg(semantic). L1 ⊆ L2 ⊆ L3 ⊆ L4, monotone.
  • Split edge lists at their LCA (intra on the callable, cross on the application); the list name is the type.
  • source once per module, spans carry byte offsets, all text (incl. get_method_body) slices off it.
  • Neo4j is co-primary and full-depth; containment renders as HAS_*/DECLARES edges.

The skill spine — SKILL.md (rebuilt)

  • Two entry paths: new language (scaffold) vs existing analyzer (major-release migration to v2).
  • Workflow re-anchored as grow the tree level by level, each additive and independently gated, emitting both projections.
  • The CLAUDE.md/AGENTS.md agent guide must describe the schema model for maintainability.
  • Hand-off to the frontend as a separate SDK major release.

New + reconciled references

  • New schema-migration.md — field-by-field old→v2 (flat symbol_table→tree, rich edges→identity, per-callable code→module source slicing, is_* booleans→kind, (sig,node)can:// ids, Java no-heap|fullddg ssa|points-to), level-by-level order, release coordination.
  • Recast schema-reference.md as the per-kind field + edge appendix.
  • Reconciled symbol-table-construction, backend-recipe, testing-and-validation (+ new monotonicity superset gate and two-tier identity gate), dataflow-graphs (emission superseded — dataflow lives in the tree), neo4j-projection (co-primary), schema-design-loop, analyzer-architecture (neo4j/ non-optional).

Scope / next

This is the backend skill (points a, b, d). The frontend SDK v2 migration (point c — remap Pydantic to v2, keep the public API) is flagged in cldk-sdk-frontend/references/schema-contract.md and is the next rebuild.

Test plan

Docs-only. All reference links resolve; keystone worked example is self-consistent L1→L4.

rahlk added 5 commits July 2, 2026 19:06
Rewrite canonical-schema.md as the v2 keystone — the single source of truth:
one scale-free node + single-parent containment tree + typed edge overlays
(a CPG), emitted as two projections (analysis.json + Neo4j). Named-map
hierarchy (application -> symbol_table{module} -> types/functions ->
callables -> body); can:// path ids with an app segment; the additive level
model (L1 tree->callable, L2 call_graph, L3 intraprocedural cfg/cdg/ddg
syntactic, L4 interprocedural param vertices + param_in/out/summary + ddg
semantic); split per-type edge lists placed at their LCA; source once per
module with byte-offset spans (all text slices off it); monotonic
L1<=..<=L4 with the syntactic/semantic DDG split visible in prov; no-null;
max_level authoritative. Recast schema-reference.md as the per-kind field +
edge appendix and the language-expansion rubric.
Rewrite the backend SKILL.md spine around schema v2: two entry paths (new
language vs existing-analyzer major-release migration); JSON + Neo4j as
co-primary projections; the workflow re-anchored as grow-the-tree-level-by-
level (L1 symbol table -> L2 call graph -> L3 intraprocedural cfg/cdg/ddg ->
L4 interprocedural SDG), each additive and independently gated; the CLAUDE.md
agent guide must describe the schema model; hand-off to the frontend as a
separate SDK major release. New references/schema-migration.md: field-by-field
old->v2 mapping (flat symbol_table -> tree, rich edges -> identity, per-callable
code -> module source slicing, is_* booleans -> kind, (sig,node) -> can:// ids,
Java no-heap|full -> ddg ssa|points-to), level-by-level order, release
coordination.
L1 now builds the named-map tree to callable depth with can:// ids, spans
with byte offsets, and module-level source; call sites are recorded as call
nodes in body with callee:null (so get_call_sites stays an L1 accessor),
resolved at L2. Update the keystone level table and schema-reference body
kinds to match (call = L1, statements/entry/exit = L3).
Re-anchor the remaining backend refs on the keystone: backend-recipe steps 2/5/6
(v2 tree, module source, call nodes with callee:null, call_graph list);
testing-and-validation (Application vocab, plus new monotonicity superset gate
and two-tier identity gate); dataflow-graphs emission note superseding the
standalone program_graphs shape (dataflow now lives in the tree body{} + split
edge lists); neo4j-projection reframed as co-primary with containment-as-HAS_*
edges and v2 node kinds; schema-design-loop anchored on the keystone (confirm
language expansion, not re-design); analyzer-architecture lists neo4j/ as a
non-optional subpackage.
Add a superseding note pointing the frontend SDK contract at the backend v2
keystone; mapping the Pydantic models to v2 while keeping the same public API
is a separate major SDK release (backend hand-off point c) and the next
rebuild of the frontend skill.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant