Rebuild codeanalyzer-backend on schema v2 (additive CPG) by rahlk · Pull Request #9 · codellm-devkit/cldk-forge

rahlk · 2026-07-02T23:17:41Z

Summary

Ground-up rebuild of the codeanalyzer-backend skill around schema v2 — the additive
node-tree-plus-typed-edges model (a CPG) worked out in design discussion. One structure, two
projections (analysis.json + Neo4j), grown level by level.

The keystone — `canonical-schema.md` (rewritten)

Codeanalyzer is an additive analysis paradigm: each analysis level is the same tree grown one
layer deeper, plus one edge family over the new layer.

One scale-free node + single-parent containment tree + typed edge overlays = a CPG; every "section" is a projection.
Named-map hierarchy: application → symbol_table{module} → types/functions → callables → body.
can:// path ids with an app segment (disambiguates apps in one language); durable ids ≥ callable, …@line:col / …@tag ordinals below.
Additive levels: L1 tree→callable (+ call nodes, get_call_sites stays L1) · L2 call_graph (callable→callable, callee backfill) · L3 intraprocedural cfg/cdg/ddg(syntactic) · L4 interprocedural param vertices + param_in/param_out/summary + ddg(semantic). L1 ⊆ L2 ⊆ L3 ⊆ L4, monotone.
Split edge lists at their LCA (intra on the callable, cross on the application); the list name is the type.
source once per module, spans carry byte offsets, all text (incl. get_method_body) slices off it.
Neo4j is co-primary and full-depth; containment renders as HAS_*/DECLARES edges.

The skill spine — `SKILL.md` (rebuilt)

Two entry paths: new language (scaffold) vs existing analyzer (major-release migration to v2).
Workflow re-anchored as grow the tree level by level, each additive and independently gated, emitting both projections.
The CLAUDE.md/AGENTS.md agent guide must describe the schema model for maintainability.
Hand-off to the frontend as a separate SDK major release.

New + reconciled references

New schema-migration.md — field-by-field old→v2 (flat symbol_table→tree, rich edges→identity, per-callable code→module source slicing, is_* booleans→kind, (sig,node)→can:// ids, Java no-heap|full→ddg ssa|points-to), level-by-level order, release coordination.
Recast schema-reference.md as the per-kind field + edge appendix.
Reconciled symbol-table-construction, backend-recipe, testing-and-validation (+ new monotonicity superset gate and two-tier identity gate), dataflow-graphs (emission superseded — dataflow lives in the tree), neo4j-projection (co-primary), schema-design-loop, analyzer-architecture (neo4j/ non-optional).

Scope / next

This is the backend skill (points a, b, d). The frontend SDK v2 migration (point c — remap Pydantic to v2, keep the public API) is flagged in cldk-sdk-frontend/references/schema-contract.md and is the next rebuild.

Test plan

Docs-only. All reference links resolve; keystone worked example is self-consistent L1→L4.

Rewrite canonical-schema.md as the v2 keystone — the single source of truth: one scale-free node + single-parent containment tree + typed edge overlays (a CPG), emitted as two projections (analysis.json + Neo4j). Named-map hierarchy (application -> symbol_table{module} -> types/functions -> callables -> body); can:// path ids with an app segment; the additive level model (L1 tree->callable, L2 call_graph, L3 intraprocedural cfg/cdg/ddg syntactic, L4 interprocedural param vertices + param_in/out/summary + ddg semantic); split per-type edge lists placed at their LCA; source once per module with byte-offset spans (all text slices off it); monotonic L1<=..<=L4 with the syntactic/semantic DDG split visible in prov; no-null; max_level authoritative. Recast schema-reference.md as the per-kind field + edge appendix and the language-expansion rubric.

Rewrite the backend SKILL.md spine around schema v2: two entry paths (new language vs existing-analyzer major-release migration); JSON + Neo4j as co-primary projections; the workflow re-anchored as grow-the-tree-level-by- level (L1 symbol table -> L2 call graph -> L3 intraprocedural cfg/cdg/ddg -> L4 interprocedural SDG), each additive and independently gated; the CLAUDE.md agent guide must describe the schema model; hand-off to the frontend as a separate SDK major release. New references/schema-migration.md: field-by-field old->v2 mapping (flat symbol_table -> tree, rich edges -> identity, per-callable code -> module source slicing, is_* booleans -> kind, (sig,node) -> can:// ids, Java no-heap|full -> ddg ssa|points-to), level-by-level order, release coordination.

L1 now builds the named-map tree to callable depth with can:// ids, spans with byte offsets, and module-level source; call sites are recorded as call nodes in body with callee:null (so get_call_sites stays an L1 accessor), resolved at L2. Update the keystone level table and schema-reference body kinds to match (call = L1, statements/entry/exit = L3).

Re-anchor the remaining backend refs on the keystone: backend-recipe steps 2/5/6 (v2 tree, module source, call nodes with callee:null, call_graph list); testing-and-validation (Application vocab, plus new monotonicity superset gate and two-tier identity gate); dataflow-graphs emission note superseding the standalone program_graphs shape (dataflow now lives in the tree body{} + split edge lists); neo4j-projection reframed as co-primary with containment-as-HAS_* edges and v2 node kinds; schema-design-loop anchored on the keystone (confirm language expansion, not re-design); analyzer-architecture lists neo4j/ as a non-optional subpackage.

Add a superseding note pointing the frontend SDK contract at the backend v2 keystone; mapping the Pydantic models to v2 while keeping the same public API is a separate major SDK release (backend hand-off point c) and the next rebuild of the frontend skill.

rahlk added 5 commits July 2, 2026 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rebuild codeanalyzer-backend on schema v2 (additive CPG)#9

Rebuild codeanalyzer-backend on schema v2 (additive CPG)#9
rahlk wants to merge 5 commits into
mainfrom
feat/schema-v2-rebuild

rahlk commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rahlk commented Jul 2, 2026

Summary

The keystone — canonical-schema.md (rewritten)

The skill spine — SKILL.md (rebuilt)

New + reconciled references

Scope / next

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

The keystone — `canonical-schema.md` (rewritten)

The skill spine — `SKILL.md` (rebuilt)