diff --git a/skills/cldk-sdk-frontend/references/schema-contract.md b/skills/cldk-sdk-frontend/references/schema-contract.md
index 07d2f9d..acb5511 100644
--- a/skills/cldk-sdk-frontend/references/schema-contract.md
+++ b/skills/cldk-sdk-frontend/references/schema-contract.md
@@ -1,7 +1,16 @@
 # The analysis.json contract the SDK models must satisfy
 
+> **Schema v2 — this file predates it.** The analyzer contract is now the backend skill's v2
+> keystone (`codeanalyzer-backend/references/canonical-schema.md`): one additive node-tree + typed
+> edges (a CPG), `can://` ids, `application → symbol_table{module} → types/functions → callables →
+> body`, split edge lists, `source` per module, `max_level`. Mapping the SDK's Pydantic models to
+> v2 while **keeping the same public API** (`CLDK.<lang>(...)`, the same accessors) is a **major SDK
+> release** (the backend hand-off's `§ c`) and is the next rebuild of *this* skill. Until that lands,
+> read the sections below as the *old* (v1) contract; the authoritative shape is the v2 keystone and
+> the analyzer's real sample `analysis.json`.
+
 The analyzer (built by the **codeanalyzer-backend** skill) emits a single `analysis.json`. Your job
-in this skill is to encode SDK-side `<L>` models that **load and validate that JSON**, plus a facade
+in this skill is to encode SDK-side models that **load and validate that JSON**, plus a facade
 that queries it. This file states the **invariant contract** the models must satisfy. It is *not*
 the exhaustive field catalog — **the authoritative, complete field list is whatever the analyzer's
 sample `analysis.json` actually contains** (plus the node kinds recorded in the backend's
diff --git a/skills/codeanalyzer-backend/SKILL.md b/skills/codeanalyzer-backend/SKILL.md
index 00bd09b..6456aa7 100644
--- a/skills/codeanalyzer-backend/SKILL.md
+++ b/skills/codeanalyzer-backend/SKILL.md
@@ -1,516 +1,238 @@
 ---
 name: codeanalyzer-backend
 description: >-
-  Build the BACKEND language analyzer for CodeLLM-DevKit (CLDK): a
-  `codeanalyzer-<lang>` that parses a NEW programming language and emits the canonical
-  `analysis.json` (symbol table + resolver-based call graph), then packages and releases it as a
-  thin `codeanalyzer-<lang>` PyPI distribution. Use this whenever a CLDK maintainer wants to "add a
-  language", "build a codeanalyzer for <X>", "write a CLDK backend/analyzer for <X>", or
-  "support <X> in CLDK" at the analyzer level — even if they don't say the word "skill". The core
-  move is a guided, informed decision about the analyzer's backend tooling (parser, resolver,
-  enrichment, packaging) for the target language, then scaffolding a MODULAR analyzer to a working,
-  validated level-1 analysis and shipping it via tag-triggered release automation. This skill stops
-  at the analyzer; wiring the analyzer into a CLDK SDK (Python/TS/…) is the companion
-  **cldk-sdk-frontend** skill. Do NOT use this for adding an extension/contribution point to an
-  EXISTING analyzer (that's codeanalyzer-extension-builder), or for merely *using* CLDK to analyze
-  code.
+  Build or migrate the BACKEND language analyzer for CodeLLM-DevKit (CLDK): a `codeanalyzer-<lang>`
+  that parses a programming language and emits the **canonical schema v2** — one additive
+  node-tree-plus-typed-edges (a CPG) — in BOTH `analysis.json` and Neo4j, then packages and
+  releases it. Use this whenever a CLDK maintainer wants to "add a language", "build a
+  codeanalyzer for <X>", "migrate <X>'s analyzer to the new schema", "emit CFG/PDG/SDG/dataflow
+  for <X>", or "support <X> in CLDK" at the analyzer level — even if they don't say "skill". Two
+  entry paths: a NEW language (scaffold the analyzer from scratch) or an EXISTING analyzer (a
+  major release that adapts it to emit schema v2). The core move is designing/confirming the
+  canonical schema for the language, then building it up **level by level** (L1 symbol table → L2
+  call graph → L3 intraprocedural dataflow → L4 interprocedural SDG), each an additive layer,
+  shipped via tag-triggered release automation with a CLAUDE.md agent guide. This skill stops at
+  the analyzer; wiring it into a CLDK SDK is the companion **cldk-sdk-frontend** skill. Do NOT use
+  this for adding a contribution point to an existing analyzer (codeanalyzer-extension-builder),
+  or for merely *using* CLDK to analyze code.
 ---
 
 # CLDK analyzer backend
 
-Build a new language's **backend analyzer** `codeanalyzer-<lang>`: it parses the language and emits
-the canonical `analysis.json` (symbol table + call graph), then ships as a thin
-`codeanalyzer-<lang>` PyPI distribution. This skill owns **one surface** — the analyzer and its
-distribution. Wiring that analyzer into a CLDK **frontend SDK** (`CLDK(language="<lang>")
-.analysis(...)` in the Python SDK, and later the TS/Rust/Go/Java SDKs) is the separate
-**cldk-sdk-frontend** skill, which consumes this skill's output. Keep that boundary: here you
-produce a validated, released analyzer + its `analysis.json` contract; the frontend skill binds it.
+Build (new language) or migrate (existing analyzer) a `codeanalyzer-<lang>` that emits the
+**canonical schema v2** (`references/canonical-schema.md` — read it first, it is the keystone).
+The schema is **one additive structure** — a tree of code nodes with typed edge overlays, a CPG —
+emitted in **two projections**: `analysis.json` and a Neo4j graph. Both are first-class
+deliverables. This skill owns that analyzer and its distribution; wiring it into a CLDK **frontend
+SDK** is the separate **cldk-sdk-frontend** skill, which consumes this skill's output.
 
-The skill's defining move is **not** picking a template — it's running a guided, informed decision
-about *how to build the backend* for this specific language, then scaffolding from that decision. A
-new language's analyzer must live in that language's own ecosystem to reach its best tooling, so the
-tooling choices genuinely differ per language and the user owns them.
+The organizing principle is the schema's own:
 
-## Before you start: orient
-
-- Confirm the **target language** and locate the CLDK reference repos — you anchor the schema
-  and construction on the **already-implemented** analyzers. They normally sit as siblings:
-  `codeanalyzer-java/`, `codeanalyzer-python/` (analyzer templates), `codeanalyzer-ts/` (a
-  **cautionary** reference — see below), and `python-sdk/` (which also contains the **C** analyzer
-  under `cldk/analysis/c/` — the procedural, non-class anchor — and is the model SDK Pydantic
-  schema the analyzer's output must validate against). **If any of these is not present locally,
-  clone it into `/tmp` and anchor on that copy** (read-only — never push to these):
-  ```
-  for r in codeanalyzer-java codeanalyzer-python codeanalyzer-ts python-sdk; do
-    [ -d "/tmp/$r" ] || git clone --depth 1 https://github.com/codellm-devkit/$r.git "/tmp/$r"
-  done
-  ```
-  Prefer a local sibling checkout if one exists (it may be ahead of `main`); fall back to the
-  `/tmp` clone. Don't invent locations, and don't proceed to schema design without at least the
-  Java and Python analyzers plus `python-sdk` available to read.
-- Skim the analyzer references to ground yourself: **`codeanalyzer-python` is the model to
-  replicate** — the modern, pluggable, cleanly-separated template (tree-sitter + Jedi);
-  `codeanalyzer-java` is the heavyweight WALA one. Most new languages follow the *structure* of
-  the Python one but in their own ecosystem. **`codeanalyzer-ts` is a cautionary reference: it
-  runs and validates, but it was generated as a flat monolith** (a 968-line grab-bag of free
-  functions, a `core` that inlines everything and hardcodes `entrypoints: {}`, and **no
-  pluggable pass/registry/finder layer at all**). Read it to learn the anti-patterns to avoid —
-  not the structure to copy. **Producing a modular package, not a working monolith, is a
-  first-class success criterion of this skill** (see `references/analyzer-architecture.md`).
-- Read these reference files now — they are the spec the scaffolding must satisfy:
-  - `references/analyzer-architecture.md` — **the modular package skeleton the analyzer must
-    have** (anchored on `codeanalyzer-python`, with `codeanalyzer-ts` as the anti-example).
-    Read it before scaffolding: the seams are laid up front, not retrofitted.
-  - `references/canonical-schema.md` — the `analysis.json` contract and its invariants. **Read first.**
-  - `references/schema-reference.md` — the exhaustive, field-by-field schema derived from the
-    SDK Pydantic models. This is what the analyzer must mirror **comprehensively** (every
-    field, not a subset), and the basis for the validation success criterion.
-  - `references/schema-design-loop.md` — **the method** for *Schema Design*: design the schema node by
-    node by anchoring on Java + Python and **bringing every divergence to the user as a
-    decision**.
-  - `references/project-materialization.md` — *Project Materialization*: the build/dependency phase that must run
-    **before parsing** (Java downloads deps for the SymbolSolver classpath; Python builds a
-    venv for Jedi) so the resolver can populate types.
-  - `references/symbol-table-construction.md` — *Symbol Table Construction*: how to walk files and populate the
-    table, modeled on how Java (`SymbolTable.extractAll`) and Python (`core.py` rglob loop)
-    actually do it.
-  - `references/backend-recipe.md` — the 9-step methodology for building the analyzer.
-  - `references/tooling-menu.md` — the per-language decision you'll walk the user through.
-  - `references/cli-contract.md` — the CLI flags the analyzer must expose (the contract the
-    frontend SDKs depend on; owned here).
-  - `references/neo4j-projection.md` — the **optional second output surface**: projecting the
-    same IR into a Neo4j graph via `--emit neo4j` (Cypher snapshot + live Bolt push). Every
-    mature analyzer ships it; add it once level-1 JSON is solid.
-  - `references/dataflow-graphs.md` — the **levels 3–4 contract**: native intraprocedural
-    CFG/DFG/PDG (L3) and interprocedural SDG + clients (L4), the CPG projection, node identity,
-    `program_graphs` emission, parity clause, and verification gates. Read before any dataflow work.
-  - `references/dataflow-construction.md` — the **method**: the stage-by-stage algorithm ladder,
-    split at the L3/L4 seam (Stages 1–4 intraprocedural / AST-only; Stages 5–8 interprocedural /
-    oracle-backed), per-language lowering checklists, gates, and fixture minimums.
-  - `references/dataflow-substrate-menu.md` — the dataflow counterpart of the tooling menu:
-    per-language CFG / def-use / points-to substrate decisions (the points-to slot is the L4 gate).
-  - `references/dataflow-issue-template.md` — the planning template a language instantiates
-    (goals, locked substrate decisions, staged PRs, caveats) before building levels 3–4.
-  - `references/testing-and-validation.md` — **all analyzer-side verification criteria, fixture
-    design rules, and definitions of done.** Read before writing any tests. (SDK-side testing
-    is the frontend skill's `references/sdk-testing.md`.)
-  - `references/packaging-and-release.md` — **the distribution layer**: cross-compile the binary,
-    ship it as a thin `codeanalyzer-<lang>` PyPI package (+ raw binaries as GitHub Release assets +
-    a `brew install codeanalyzer-<lang>` formula pushed to the shared `codellm-devkit/homebrew-tap`),
-    and cut tag-triggered releases. Standing up `packaging/python/` + `packaging/homebrew/` +
-    `release.yml` is a first-class deliverable.
-
-## Workflow
-
-Work the steps below in order, and **don't design the whole thing up front**. Design the schema,
-**scaffold the modular package skeleton**, materialize the project's dependencies, construct the
-symbol table file by file, then build the cheap resolver-based call graph. *Symbol Table Construction* + *Call Graph Construction* together
-are **level 1 — the cheap, resolver-based analysis** (symbol table *and* call graph, both from
-the same Tier-1 resolver). The heavy **level 2 — framework-based** analysis (WALA/Joern/
-SVF) is optional and comes later. Each step models itself on what the mature reference analyzers
-(Java + Python) do.
-
-### Orient & choose the backend tooling
-The developer's real first move: *what backend am I using?* Walk the user through the tooling
-menu (`references/tooling-menu.md`). **Pre-fill a recommendation for each slot** (runtime,
-structural parser, resolver, optional enrichment, build/dep materialization, packaging) and ask
-for confirmation — don't silently choose, don't ask an open-ended "what do you want?". Use
-`AskUserQuestion` for the load-bearing slots, especially *is the structural tool also the
-resolver, or are they separate?* — that reshapes everything downstream. Note what the chosen
-resolver needs materialized (Jedi→venv, TS checker→`tsconfig`+`node_modules`, `go/types`→`go mod
-download`).
-
-Also ask the **analysis depth** they want (`AskUserQuestion`):
-- **Rapid — level 1 (default):** symbol table + the cheap resolver-based call graph. The
-  framework backend is left stubbed.
-- **Deep — level 2:** also stand up the framework-based backend (Joern/SVF/WALA),
-  flipping the *Level 2: framework-based analysis* step from stubbed to implemented.
-
-Default to **rapid (level 1)** — level 1 is always built (it's the floor; level 2 builds on it),
-and deep is opt-in. Record the agreed choices — including the depth **and the packaging build
-strategy** (single-host cross-compile vs native-runner matrix; `packaging-and-release.md`) — under
-an **"Architecture & Tooling"** heading in the analyzer's own `codeanalyzer-<lang>/README.md`. This
-is deliberately a public, top-level doc: it documents for human readers *which backend tooling
-was chosen and why*, and it doubles as the guide any later session (you included, or the
-**cldk-sdk-frontend** skill) reads to recover the locked decisions without re-litigating them.
-Capture each load-bearing slot (runtime, structural parser, resolver, optional enrichment,
-build/dep materialization, packaging, depth, extra node kinds) and a one-line rationale per
-non-default choice. Keep the *Schema Design* `SCHEMA_DECISIONS.md` under the analyzer's `.claude/`
-folder (create it if needed); only these tooling decisions are promoted into the README.
+> **Codeanalyzer is an additive analysis paradigm: each analysis level is the same tree grown one
+> layer deeper, plus one edge family over the new layer.**
 
-**Then check the toolchain is installed, before building anything.** The chosen tooling has hard
-prerequisites (Node + the analyzer's deps for ts-morph; the Go toolchain for `go/types`; the
-Rust toolchain + rust-analyzer; clang/libclang for C++; plus any framework backend like
-Joern if *deep*) **and the packaging/release toolchain that cross-compiles and publishes the
-`codeanalyzer-<lang>` package** (e.g. Bun for `bun build --compile --target=...`, GraalVM
-`native-image` for JVM, the cross-compile target for Go/Rust; plus Python `build`/`wheel`/`twine`
-+ `auditwheel` for the platform wheels — see `references/packaging-and-release.md`). Probe for
-them (e.g. `node --version`, `go version`, `rustc --version`, `clang --version`, `bun --version`).
-**If anything required is missing, stop and instruct the user to install it**
-— give the exact install commands for their platform and what each is for — and **wait** until
-they confirm it's available. Do **not** proceed to scaffold-and-leave-unverified: an analyzer you
-can't run is an analyzer you can't validate against the schema, which is the whole success
-criterion. Only continue once the toolchain is present.
+So you don't "build an analyzer" and then bolt on features — you **grow one structure, level by
+level**, and each level is independently shippable.
 
-### Schema Design (interactive, node by node)
-Design the canonical schema once — it is the **contract** the analyzer's `analysis.json` emits, and
-the contract the **cldk-sdk-frontend** skill later encodes as SDK models. Here you produce **two
-things in lockstep: the analyzer-side types AND the contract** (`canonical-schema.md` /
-`schema-reference.md`); the per-SDK `cldk/models/<lang>/` Pydantic models (and TS types) are built
-later by the frontend skill against this same approved contract. Run the loop in
-`references/schema-design-loop.md` per node (spine first: `Module` →
-`Class` → `Callable` → `Callsite` → `CallEdge`, then language-native kinds):
+## Two paths
 
-1. **Anchor** — read the node in **Java** (`cldk/models/java/models.py`) and **Python**
-   (`py_schema.py`) side by side. Catalog the shared spine and **every place they diverge**.
-2. **Differentiate** — ask *"how is the `<lang>` language structurally different here?"*
-   (language semantics, not domain) and note each genuinely new concept.
-3. **Decide each open point WITH the user.** This is the rule: for every divergence and every
-   new concept, **don't choose silently — ask** (`AskUserQuestion`). Present it as *"Java did X,
-   Python did Y; for `<lang>`, concept Z, how do you want to model it?"* with explained options
-   and a recommended default. (E.g. *Java annotations are flat strings, Python uses structured
-   `PyDecorator`; for TS decorators that carry args, option 1: structured `TSDecorator`
-   (recommended) …*.) Record each answer in `.claude/SCHEMA_DECISIONS.md`.
-4. **Define** — encode the decisions into the analyzer-side type and update the contract;
-   snake_case, optional-with-defaults, spine untouched, identity-only edges. (These same decisions
-   drive the SDK models the frontend skill builds — `SCHEMA_DECISIONS.md` is its input.)
+Decide which up front (`AskUserQuestion` if unclear); the rest of the workflow branches lightly on
+it:
 
-No files are walked yet. Output: a complete, user-approved schema contract + the analyzer types +
-`SCHEMA_DECISIONS.md`.
+- **(A) New language.** No analyzer exists. Choose the backend tooling, scaffold a modular
+  analyzer, and build the schema up level by level. Most of this file.
+- **(B) Existing analyzer → schema v2.** A `codeanalyzer-<lang>` exists on the **old** schema
+  (flat `symbol_table` + rich or identity edges). This is a **major release**: keep the
+  analyzer's parsing/resolution guts, and adapt its *emission* to schema v2 (both JSON and Neo4j),
+  level by level. Follow `references/schema-migration.md`; the level structure below still governs
+  the order you migrate in.
 
-### Scaffold the modular skeleton (seams first, before filling phases)
-Before writing any analysis logic, lay out the analyzer as a **modular package that mirrors
-`codeanalyzer-python`'s structure** — one subpackage per phase plus the pluggable pass layer —
-following `references/analyzer-architecture.md`. Create the boxes empty-but-wired: a thin CLI
-entry; a `core` **orchestrator that only delegates** (no inlined parsing, and never a hardcoded
-`entrypoints: {}`); `syntactic_analysis/`, `semantic_analysis/` (with the framework backend in
-its **own subpackage**, seams scaffolded even when stubbed); and the extensibility layer —
-`analysis/` (the `AnalysisPass` base + a registry that discovers, topo-orders by
-`requires`/`provides`, and runs a `run_pipeline`) and `frameworks/` (the entrypoint-finder base).
-The built-in pass list and concrete finders may start empty, but **the seams and entry-point
-discovery must exist now** — that is exactly the layer the generated TS analyzer was missing, and
-where `codeanalyzer-extension-builder` later plugs in. Retrofitting modularity into a monolith is
-the failure this step prevents.
+Either way the target is identical: an analyzer whose output validates against
+`canonical-schema.md` at its implemented `max_level`, in both projections.
 
-### Project Materialization (build & dependency resolution)
-Before parsing, materialize the target project's dependencies so the resolver can populate
-types — this is a real phase with its own failure modes. Follow
-`references/project-materialization.md`, modeled on Java
-(`BuildProject.downloadLibraryDependencies` runs *before* the symbol table, for the
-SymbolSolver classpath) and Python (`core.py` builds a **venv** + `pip install` and passes it
-to the symbol-table builder, because Jedi needs it). For the new language: detect the manifest
-(`tsconfig.json`+`package.json`, `go.mod`), run the ecosystem installer (`npm ci` →
-`node_modules`; `go mod download`), **cache** it under `cache_dir`, **degrade gracefully** to
-partial types on failure (never crash), and honor `--no-build`/`--eager`. Source-level
-resolvers (TS checker, `go/types`, Jedi) need deps **present**, not a full compile; defer any
-heavier compile to just before *Call Graph Construction* if your call-graph backend needs build artifacts.
-
-### Symbol Table Construction (file by file)
-Now populate the schema. Follow `references/symbol-table-construction.md`, which is built by
-**studying how Java (`SymbolTable.extractAll` → `symbolTable.put(path, ...)`) and Python
-(`core.py`'s `rglob` loop → `build_pymodule_from_file` → `symbol_table[file_key] = module`)
-iterate over files** — then doing the same for the new language: discover source files (skip
-vendored/test trees), compute stable relative `file_key`s, per-file cache-check then build the
-`Module` (filling classes/functions/native kinds + **unresolved** call sites with
-`callee_signature` null + cache metadata), and assemble `symbol_table: Dict[path, Module]`.
-Support whole-project, `-t` target-files, and (optional) single-source modes. This stage
-records call sites but doesn't resolve them into edges yet — the cheap resolution is the very
-next stage (still level 1).
-
-**Path predicate pitfall — apply filters to the relative path, never the absolute path.**
-Every file-skip predicate (`IsVendored`, `IsTestFile`, and any custom equivalent) must be
-evaluated against the path *relative to the project root* — not the absolute path. Absolute
-paths carry segments from the analyzer's own directory layout (`testdata`, `vendor`, `.git`,
-etc.) that falsely trigger the filter and silently empty the symbol table. Resolve the project
-root to an absolute path at the top of the analysis entry point, then derive all relative keys
-as `rel(projectRoot, absFilePath)`. Using the process's working directory as the base
-(e.g. `rel(".", absPath)`) is a separate trap: it produces the right answer only when the
-process happens to run from the project root, which is never the case in tests.
-
-**Cross-file type/method attachment — check whether your language requires a two-pass build.**
-In some languages a type and its method bodies can be spread across multiple files of the same
-unit (Go packages, C# partial classes and extension methods, Kotlin extension functions, Ruby
-open classes). A single-pass, file-by-file builder that resolves receiver types only within
-the current file silently drops every method defined in a sibling file. Identify whether the
-target language has this property before writing the builder. If it does, use a two-pass
-approach: pass 1 collects all type declarations from every file and builds a
-`(unit, typeName) → ownerFile` index; pass 2 attaches methods using that index. Retrofitting
-this after the fact is costly — the fix lives in the core iteration loop.
-
-**Symbol-table gate (verify):** Run the analyzer on the fixture and confirm the criteria in
-`references/testing-and-validation.md § 2` (symbol-table gate). Don't proceed until this passes.
-
-### Call Graph Construction (resolver-based, cheap — completes level 1)
-This is **cheap and part of level 1**, not a heavy pass: the same Tier-1 resolver that typed the
-symbol table (Jedi/tsc/rust-analyzer/clang) is already loaded, so resolving call sites into edges
-is inexpensive. For each recorded call site: resolve the callee → **backfill `callee_signature`
-in place** → emit an identity-only edge `source_sig → target_sig` with `provenance` = your
-resolver. Handle constructors/`new`, receiver-type dispatch, and an explicit unresolved fallback
-(record the site, skip the edge, never crash). Don't mutate the symbol table beyond filling
-`callee_signature`.
+## Before you start: orient
 
-**Its precision is a decision the references disagree on — so ask.** Don't frame the tiers as
-"whole-program vs not" — once deps are materialized the resolver resolves across the
-whole program too; the axis is the *engine* (`tooling-menu.md` § "Call-graph tiers"). Python's
-cheap `jedi` call graph lives here at level 1 and **drops** unresolved sites; **Java is the
-outlier** — it has no cheap resolver call graph, so its call graph *is* the heavy Tier-2 WALA
-pass (`makeRTABuilder` → **RTA**), which for a new resolver-capable language belongs in the
-*Level 2: framework-based analysis* step. For the chosen resolver, surface the dispatch choice
-(declared-type only ≈ CHA, + instantiated subtypes ≈ RTA-style); heavier framework-based
-precision (WALA/Joern/SVF) belongs to that level-2 step, not here.
+- Confirm the **target language** and locate the CLDK reference repos (read-only; prefer a local
+  sibling checkout, else clone into `/tmp` from `github.com/codellm-devkit/<repo>`):
+  `codeanalyzer-java` (WALA — already ships L3/L4 via its slicer, the worked example of the full
+  ladder), `codeanalyzer-python`, `codeanalyzer-typescript`, and `python-sdk` (the SDK your output
+  must validate against). For an existing-analyzer migration, its own repo is the primary anchor.
+- **Read the keystone first**, then the rest:
+  - `references/canonical-schema.md` — **the v2 model.** The tree, the id grammar, the additive
+    levels, the two projections. Everything else serves this.
+  - `references/schema-reference.md` — the per-kind field/edge appendix.
+  - `references/schema-design-loop.md` — **the method** for confirming the language's schema node
+    by node (which kinds/fields it adds), anchored on the keystone + Java/Python.
+  - `references/schema-migration.md` — path (B): old schema → v2, field-by-field, as a major
+    release.
+  - `references/analyzer-architecture.md` — the **modular package skeleton** (delegating `core`,
+    per-phase subpackages, pluggable pass layer). Producing a *modular* analyzer is a success
+    criterion, not a nicety.
+  - `references/tooling-menu.md` — the L1/L2 backend-tooling decision (parser, resolver).
+  - `references/dataflow-substrate-menu.md` — the L3/L4 substrate decision (CFG source, def-use,
+    points-to oracle). The points-to slot is the L4 gate.
+  - `references/dataflow-graphs.md` + `references/dataflow-construction.md` — the L3/L4 contract
+    and construction method (CFG → dominance → def-use → PDG → summaries → SDG).
+  - `references/cli-contract.md` — the CLI flags (`-a 1|2|3|4`, `--emit`, `--graphs`).
+  - `references/neo4j-projection.md` — the co-primary graph projection (always full-depth).
+  - `references/project-materialization.md`, `references/testing-and-validation.md`,
+    `references/packaging-and-release.md` — build/deps, gates, distribution.
+
+## Workflow — grow the tree, level by level
+
+Work in order. Design the schema, scaffold the modular skeleton, materialize dependencies, then
+**build the structure one level at a time**, each additive and gated. Every level emits **both**
+projections (JSON + Neo4j). Levels 1–2 are the floor (always built); levels 3–4 are the dataflow
+tier (opt-in, added when asked and when the substrate is chosen).
 
-**Verify:** confirm the criteria in `references/testing-and-validation.md § 2` (call-graph
-gate) — every edge endpoint matches a real signature (no dangling nodes) and output still
-validates. (`backend-recipe.md` step 6.)
+### Orient & choose the backend tooling
+Walk the user through `references/tooling-menu.md` (runtime, structural parser, resolver,
+build/dep materialization, packaging) and — **if L3/L4 are in scope** —
+`references/dataflow-substrate-menu.md` (CFG source, def-use source, points-to oracle). Pre-fill a
+recommendation per slot and confirm (`AskUserQuestion` for load-bearing ones). Ask the **target
+depth** (`max_level`): L1–2 (symbol table + call graph, the default floor), L3 (intraprocedural
+dataflow), or L4 (interprocedural SDG + taint). Record the locked decisions under an **Architecture
+& Tooling** heading in the analyzer's `README.md`, and keep schema decisions in `.claude/
+SCHEMA_DECISIONS.md`. **Then verify the toolchain is installed** (parser, resolver, the points-to
+oracle if L4, plus the packaging/release toolchain) — if anything required is missing, stop and
+give exact install commands, and wait. An analyzer you can't run is one you can't validate.
+
+### Schema design (confirm the language's shape against the keystone)
+The schema is already designed — it's `canonical-schema.md`. Here you **confirm the
+language-specific expansion**: which type kinds, callable kinds, body-node kinds, CFG-edge kinds,
+and typed fields this language adds to the shared spine (`references/schema-design-loop.md`). Run
+it node by node, anchoring on the keystone and on how Java/Python model the same concept, and
+**bring every genuine divergence to the user** (`AskUserQuestion`) — *"the spine has `type` with a
+`kind`; Go needs `struct` + a receiver on methods; model receiver as X?"*. Record each answer in
+`.claude/SCHEMA_DECISIONS.md`. Output: the confirmed per-language kind/field set, still the same
+tree. (Path B: this is where you map old fields → v2 kinds; see `schema-migration.md`.)
+
+### Scaffold the modular skeleton (seams first)
+Lay out the analyzer as a **modular package** mirroring `codeanalyzer-python`
+(`references/analyzer-architecture.md`): a thin CLI; a `core` **orchestrator that only delegates**;
+`syntactic_analysis/` (the tree builder), `semantic_analysis/` (call graph + the dataflow passes,
+framework backend isolated in its own subpackage), a `neo4j/` projection subpackage, and the
+pluggable `analysis/` pass layer + `frameworks/` finder layer. Create the boxes empty-but-wired.
+Retrofitting modularity into a monolith is the failure this prevents (`codeanalyzer-ts`'s original
+flat build is the anti-example).
+
+### Project materialization (build & dependency resolution)
+Before parsing, materialize the target project's dependencies so the resolver can populate types
+(`references/project-materialization.md`) — Java downloads deps for the classpath, Python builds a
+venv for Jedi, Go runs `go mod download` for `go/packages`. Cache under `cache_dir`, degrade
+gracefully to partial types on failure, honor `--no-build`/`--eager`.
+
+### L1 — build the tree (symbol table)
+Grow the containment tree to **callable depth**: `application → symbol_table{module} →
+types{}/functions{} → callables{}`, each node with its `can://` id, `kind`, `span` (with byte
+offsets), and the module's `source` stored once (`references/symbol-table-construction.md`).
+Populate the language-native kinds/fields confirmed in schema design. This is the floor;
+everything hangs off it. **Emit both projections** (JSON tree + Neo4j nodes/`HAS_*` edges).
+**Gate:** output validates against the SDK `Application` model; `symbol_table` keys are relative
+paths (no absolute, no `..`); `get_method_body` slices `module.source` correctly; re-run reuses
+cache. (`references/testing-and-validation.md` § symbol-table gate.)
+
+### L2 — call graph
+Add the **`call_graph`** edge list at the application scope: resolve each call into a
+`callable → callable` edge with `prov` and `weight`, using the Tier-1 resolver
+(`references/dataflow-graphs.md` § levels). Backfill the `callee` refinement slot on call nodes
+(`null → id`) — the one sanctioned mutation. Call edges are **immutable once written** (never
+re-anchored to a statement at L3). Framework enrichment (Joern/WALA) merges *into this same list*
+with added provenance — it's the orthogonal precision axis, not a level. **Gate:** every edge
+endpoint is a real callable id (no dangling); output still validates.
+
+### L3 — intraprocedural dataflow (optional; the first dataflow level)
+Grow the tree **below the callable**: populate each callable's `body` with statement nodes, and
+add the intra-callable edge lists `cfg`, `cdg`, `ddg` (**syntactic** — name-equality, no points-to
+oracle needed). Build stage by stage per `references/dataflow-construction.md` (CFG → dominance →
+def-use → PDG). AST-only and **per-callable parallel** (`-j`). This is a complete, shippable
+capability (`-a 3`). **Gate:** the intraprocedural backward-slice on the fixture equals the
+hand-computed node set, exactly.
+
+### L4 — interprocedural dataflow (optional; needs the points-to oracle)
+Add the **synthetic parameter vertices** (`formal_in/out`, `actual_in/out`) to `body`, the
+cross-function `param_in`/`param_out` edge lists, the intra-caller `summary` edges, and the
+**semantic** (alias-aware) `ddg` edges (`prov:["points-to"]`) — the whole-program SDG. Needs the
+points-to oracle from the substrate menu + the summary fixpoint (stages 5–8 of
+`dataflow-construction.md`). `-a 4`. **Gate:** no dangling SDG endpoints; a known source→sink taint
+flow is found and its sanitized variant reported sanitized.
+
+### Neo4j projection (co-primary, always full-depth)
+The Neo4j graph is not an afterthought — it's the **second required projection**
+(`references/neo4j-projection.md`). Build it as the modular `neo4j/` subpackage (pure
+`project() → GraphRows → cypher/bolt writers` + a declarative schema catalog). Containment renders
+as typed `HAS_*`/`DECLARES` edges; every overlay edge renders as a typed relationship; nodes carry
+their `can://` id. `--emit neo4j` always runs at **maximum implemented depth** — analysis levels
+gate the JSON path only; combining `-a`/`--graphs` with `--emit neo4j` is an explicit error. Keep
+the graph schema versioned and in lockstep with the JSON schema (same kinds → labels).
 
 ### CLI, caching/incremental, packaging & release
-Add the CLI family surface (`cli-contract.md`) with `analysis.json` as the only facade-visible
-output. **Validate all flag values** — unrecognized or unimplemented values (e.g. `--format
-msgpack` before msgpack is implemented) must return a non-zero exit with a clear message, never
-silently fall back (`cli-contract.md § Flag validation requirements`).
-
-**Caching has three independent layers — implement and test each explicitly:**
-
-1. **Materialization cache** — memoizes the dependency-fetch step (`go mod download`, `npm ci`,
-   venv build) by hashing the manifest (`go.sum`, `package-lock.json`, `requirements.txt`).
-   Stored in `cache_dir`. Bypassed by `--eager`.
-2. **Per-run output cache** (`analysis_cache.json`) — written to `cache_dir` after every
-   successful `Analyze()` call. Always rewritten; gives the SDK something to read without
-   re-invoking the binary. `--eager` rewrites it; non-eager runs still write it (it's not
-   a skip guard at the binary level).
-3. **SDK-level skip** — the Python facade reads the *output dir*'s `analysis.json`, validates
-   it, and **skips invoking the binary entirely** if valid. This is where the real "don't
-   re-run the binary" logic lives (frontend skill). The binary itself always runs fresh
-   analysis when invoked.
-
-The behavioral tests for caching are in `references/testing-and-validation.md § 2`.
-
-**For packaging, be opinionated and follow `references/packaging-and-release.md`:
-build a self-contained binary for every platform, then ship it as a thin
-`codeanalyzer-<lang>` PyPI package** — one platform-tagged wheel per OS/arch, carrying the binary
-and exposing `bin_path()` — **plus raw binaries as GitHub Release assets, plus a Homebrew formula
-`Formula/codeanalyzer-<lang>.rb` pushed to the shared `codellm-devkit/homebrew-tap`** (so end users
-get `brew install codeanalyzer-<lang>`), all cut by a **tag-triggered `release.yml`**. The brew
-formula reuses the same Release-asset binaries (compiled case) or the same PyPI package (Python
-case) — never a rebuild. The frontend SDKs *depend on* that published package; they never
-bundle or build the binary. Build it by **single-host cross-compile where the toolchain allows** (TS
-via `bun build --compile --target=<plat>`; Go via `GOOS`/`GOARCH`; Rust via target triples) **or a
-native-runner build matrix where it doesn't** (JVM via GraalVM `native-image`, which can't
-cross-compile; C++/clang with per-target sysroots). A Python analyzer is the same PyPI package but
-its wheel carries code, imported in-process. **Release automation is standard practice, not optional:** stand up
-`packaging/python/` (the `build_wheels.sh` + `pyproject.toml` + `bin_path()` package) and
-`.github/workflows/release.yml`, tag releases `vX.Y.Z` with real notes modeled on
-`codeanalyzer-python`'s GitHub Releases (Keep-a-Changelog *Added/Changed/Fixed* + auto-generated
-*Detailed Changes*), publish to PyPI under `codeanalyzer-<lang>` (prefer OIDC Trusted Publishing),
-and **record the published name + version** so the frontend skill can pin it. (`backend-recipe.md`
-steps 3, 8, 9; full spec in `references/packaging-and-release.md`; rationale in `tooling-menu.md`
-§ "Packaging".)
-
-### (Optional) Neo4j graph projection — a second output surface
-Once the level-1 `analysis.json` path is solid, add the **optional Neo4j projection** every
-mature analyzer now ships (`references/neo4j-projection.md`). It is not an ingestion of the
-JSON — it's an **alternative projection of the same in-memory IR**, selected by `--emit neo4j`,
-producing either a self-contained `graph.cypher` snapshot or a live Bolt push, plus `--emit
-schema` for the static `schema.neo4j.json` contract. Build it as a modular `neo4j/` subpackage
-(`project` → `GraphRows` → `cypher`/`bolt` writers + a declarative `schema`), keep the driver an
-**optional/lazy** dependency, and hold the graph schema in lockstep with the JSON schema (same
-`SCHEMA_DECISIONS.md` node kinds → node labels; identity-only call edges → `CALLS`). The SDK's
-Neo4j backend (frontend skill) reconstructs the canonical model from this graph, so the node
-families and `--app-name` anchor must match. **The graph is always full-depth:** analysis levels
-gate the JSON path only — `--emit neo4j` runs at maximum implemented depth (once levels 3–4 exist,
-the complete SDG/CPG, unconditionally), and combining `-a`/`--graphs` with it is an explicit
-error (`neo4j-projection.md § Depth rule`). Leave the projection out only if the user explicitly
-scopes to JSON-only; otherwise it's a standard deliverable of the CLI/packaging stage.
-
-### (Optional) Level 2: framework-based analysis
-Gated on the depth choice from *Orient & choose the backend tooling*. The heavy tier — a dedicated analysis engine
-(Joern/SVF, or WALA-style; `backend-recipe.md` step 7) for points-to/dataflow edges the
-cheap resolver can't reach. If the user picked **rapid (default)**, leave it a wired, flag-gated
-extension point with a clear TODO. If they picked **deep**, implement it now and merge its edges
-into the resolver graph by `(source, target)` with provenance union. (For a language whose call
-graph is *only* available this way — e.g. Java/WALA — this stage is where that call graph lives,
-regardless of the depth choice.)
-
-### (Optional) Levels 3–4: native dataflow graphs
-A separate, later body of work — never part of the initial language bring-up, and itself **two
-shippable levels**. When the user asks for dataflow, slicing, or taint ("CFG/PDG/SDG",
-"reachability", "what does this value affect"), plan it with
-`references/dataflow-issue-template.md` (one epic issue, staged PRs), decide the substrate slots
-from `references/dataflow-substrate-menu.md` (confirmed with the user, recorded in the README's
-*Architecture & Tooling*), and build stage by stage per `references/dataflow-construction.md`
-against the contract in `references/dataflow-graphs.md`.
-
-The **L3/L4 split is the key planning decision**: **level 3** (`-a 3`, intraprocedural CFG/DFG/PDG
-per function) is AST-only, per-callable parallel, and shippable with *no points-to oracle* — ship
-it first. **Level 4** (`-a 4`, the interprocedural SDG + taint/slicing clients) is the heavier
-tier that needs the oracle from the substrate menu and the whole-program summary fixpoint — add it
-once the oracle lands. The rules that bind both: everything is **native and in-process**; graphs
-are keyed by `(signature, node_id)` on the same `signatureOf()`; each stage's gate passes before
-the next; `-a 1`/`-a 2` stay untouched and `-a 3` must not pay L4's cost; the **SDG (L4) is the
-core artifact** (clients query it), and the CPG is only its Neo4j projection — skip the CPG if the
-Neo4j surface isn't in scope.
+Add the CLI family (`references/cli-contract.md`): `-a 1|2|3|4`, `--emit json|neo4j|schema`,
+`--graphs`, `-j/--jobs`, `--eager`, `-c/--cache-dir`. **Validate all flag values** (unimplemented
+→ non-zero error, never silent fallback). Cache by hash/mtime with vendored/test trees skipped.
+**For packaging, be opinionated and follow `references/packaging-and-release.md`:** a
+self-contained binary per platform, shipped as a thin `codeanalyzer-<lang>` PyPI wheel (+ GitHub
+Release binaries + a `codellm-devkit/homebrew-tap` formula), cut by a tag-triggered `release.yml`.
+The SDKs depend on the published package; they never build the binary. For an existing analyzer
+migrating to v2, this is a **major version bump** — the schema change is breaking.
 
 ### Write the analyzer README (last build step)
-The analyzer's `codeanalyzer-<lang>/README.md` already holds the **Architecture & Tooling**
-decisions recorded back in *Orient & choose the backend tooling*. As the **final build step**,
-grow that file into a complete, user-facing README modeled on the reference analyzers'
-**`main`-branch** READMEs — `codeanalyzer-python/README.md` (the model to replicate) and
-`codeanalyzer-java/README.md`. Don't invent a layout; mirror theirs, in this order:
-- **Logo + title + one-line what-it-is** — open with the shared CLDK logo, reusing the Python
-  repo's hosted URL (the analyzers share branding) rather than committing a per-language copy:
-  ```md
-  ![logo](https://github.com/codellm-devkit/codeanalyzer-python/blob/main/docs/assets/logo.png?raw=true)
-  ```
-  Then name the language and the chosen backend tooling (e.g. "Static analysis for `<lang>`
-  using `<parser>` + `<resolver>`"), echoing the reference openers.
-- **Prerequisites / installation** — the toolchain confirmed installed up front (runtime,
-  parser, resolver, plus any framework backend if *deep*), with exact per-platform install
-  commands as Python does for `venv`/build tools. Read the minimum version from the **build
-  manifest** (`go.mod`'s `go` directive, `Cargo.toml`'s `rust` field, `pyproject.toml`'s
-  `requires-python`, etc.) — not from what happens to be installed. Record both the minimum
-  and the version the analyzer was actually tested on.
-- **Building, packaging & releasing** — how to build the self-contained binary and ship it
-  as the `codeanalyzer-<lang>` PyPI package + GitHub Release assets, and how releases are cut
-  (`packaging/python/` + `packaging/homebrew/` + tag-triggered `release.yml`), per *CLI,
-  caching/incremental, packaging & release* and `references/packaging-and-release.md`. For an SDK
-  user it's just `pip install codeanalyzer-<lang>`; for an end user, `brew tap codellm-devkit/tap &&
-  brew install codeanalyzer-<lang>`.
-- **Usage + CLI options** — paste the real `--help` output (from `cli-contract.md`), then a few
-  worked **examples** like the Python README (basic symbol table, `--output`, level-2/framework
-  flag).
-- **Analysis levels** — what level 1 (symbol table + resolver call graph) emits today and what
-  level 2 (framework backend) adds — flagged stubbed-vs-implemented per the depth choice.
-- **Output schema** — point at the canonical `analysis.json` / `<Lang>Application` contract.
-- **SDK integration** — note that the CLDK SDKs bind this analyzer (Python:
-  `CLDK(language="<lang>").analysis(...)`; others later), wired by the **cldk-sdk-frontend** skill.
-- Keep the **Architecture & Tooling** section (the locked decisions) intact as its own heading.
-
-Write only what actually runs — don't document level-2 as working if it's a stubbed extension
-point. The README is the human-readable counterpart to the validated `analysis.json`: like every
-other stage, it describes the analyzer as it really is.
+Grow the `README.md` (which already holds the Architecture & Tooling decisions) into a complete,
+user-facing README modeled on `codeanalyzer-python`'s: logo + one-liner; prerequisites (read the
+minimum toolchain version from the build manifest, not what's installed); building/packaging/
+releasing; usage + real `--help`; **the analysis levels** (what L1–L4 emit today, flagged
+implemented-vs-stubbed by `max_level`); the schema contract (point at `canonical-schema.md`); and
+SDK integration (bound by **cldk-sdk-frontend**). Write only what actually runs.
 
 ### Write the agent guide (CLAUDE.md + AGENTS.md symlink) — a default artifact
-Every analyzer repo ships an **agent onboarding guide as a standing deliverable**, not an
-afterthought: a root `CLAUDE.md`, with `AGENTS.md` as a **symlink pointing at it**, so Claude Code
-and the generic-agent convention read one source of truth. Always produce these — even for a
-minimal analyzer.
-
-**The template is `codeanalyzer-typescript/CLAUDE.md` — mirror it.** It is the canonical form; do
-not invent a layout. Read it and reproduce its structure, regenerating the analyzer-specific
-sections for `<lang>` and carrying the standard sections over near-verbatim (adjusted for the new
-repo). `CLAUDE.md` is the *contributor/maintainer* counterpart to the user-facing README — it tells
-a coding agent how this repo is built, not how to use the CLI. Keep it concise and **specific to
-the analyzer as actually built** (no boilerplate), in the template's order:
-
-- **Title + one-liner** — `Agent guidance for codellm-devkit/codeanalyzer-<lang> (<short-name>)`.
-- **What this project is** — the language, the chosen backend tooling, that it emits the canonical
-  `analysis.json` (symbol table + resolver call graph) **and** (if built) the optional Neo4j
-  projection, and that it **mirrors the Java/Python/TS sibling analyzers so output-shape parity is
-  a first-class concern**. One line, pointing at the README's *Architecture & Tooling* section for
-  the locked decisions.
-- **Architecture — follow the pipeline** — name the single `analyze()`/`core` orchestrator and
-  list its ordered stages (materialize → symbol table → call graph → cache → output/neo4j), the way
-  the template walks `src/core.ts`. State the **modularity rules as invariants** a change must
-  preserve (no inlined analysis in `core`, no hardcoded `entrypoints: {}`, builder split by node
-  kind — from `references/analyzer-architecture.md`), and that `<Lang>Application` in the schema is
-  the output contract.
-- **Directory map** — a path → responsibility table for the actual package layout.
-- **Commands** — the real build/test/run/typecheck/schema-gen commands (e.g. `bun run build`,
-  `bun test`, `bun run gen:schema`; or the Go/Rust/Python equivalents), and the fixture used to
-  validate `analysis.json`.
-- **Schema + packaging contract** — output must validate against the SDK `<Lang>Application` model
-  (point at `.claude/SCHEMA_DECISIONS.md`); the Neo4j schema is versioned and enforced by a
-  conformance test — treat it as a contract; and the version-lockstep rule across the manifest,
-  `packaging/python/`, the SDK pins, and the brew formula (`references/packaging-and-release.md`).
-- **The standard working-style + rules + auxiliary sections** — carry the template's *"I implement
-  features myself — you assist"*, the numbered **Rules** (think before coding; simplicity;
-  issue → branch → PR; guard the contract), the teaching-loop / spaced-repetition section (which
-  defers to `~/.claude/CLAUDE.md`), and the *Auxiliary support tasks* (e.g. tidy up the release
-  announcement) over near-verbatim, adjusting repo name, short-name, and the upgrade one-liners
-  (`pip install -U codeanalyzer-<lang>`, the brew tap) for this analyzer.
-- **Repo rules** — carry over any unbreakable conventions the repo already states (never add
-  AI-authorship trailers / `🤖` signoffs to PRs); preserve an existing `CLAUDE.md`'s rules rather
-  than dropping them.
-
-Create the symlink as a **relative** link at the repo root so it survives clone/checkout:
-```bash
-ln -sf CLAUDE.md AGENTS.md
-```
-**Watch the global-gitignore trap:** many setups exclude `AGENTS.md` in a global
-`~/.gitignore_global`, so `git add AGENTS.md` silently no-ops and the symlink never gets
-committed. Un-ignore it in the repo's local `.gitignore` (a repo rule overrides the global one),
-then commit:
-```gitignore
-# Un-ignore the agent guide past a global gitignore that excludes AGENTS.md
-!CLAUDE.md
-!AGENTS.md
-```
-Verify with `git check-ignore AGENTS.md` (should print nothing) and confirm `git ls-files` shows
-both. If the negation isn't enough in your setup, `git add -f AGENTS.md`. Commit both files (git
-stores the symlink). If a `CLAUDE.md` already exists (as a one-line rule file), **fold its content
-into the new guide** before adding the symlink — never silently discard it.
+Every analyzer repo ships a root **`CLAUDE.md`, with `AGENTS.md` as a relative symlink** to it, so
+Claude Code and the generic-agent convention read one source of truth. **Mirror
+`codeanalyzer-typescript/CLAUDE.md`** as the template, and it must **describe the schema v2 model
+in detail** (for maintainability): the additive paradigm, the node tree + edge overlays, the
+`can://` ids, the level structure, and the two projections — so a future agent understands *what
+this analyzer emits and why* without re-deriving it. Cover: what the repo is + chosen tooling; the
+modular architecture and its invariants; how to build/test/run + the validation fixture; the schema
+contract (link `canonical-schema.md` + `.claude/SCHEMA_DECISIONS.md`); packaging/release + version
+lockstep; and repo rules (never add AI-authorship trailers). Watch the **global-gitignore trap** —
+many setups exclude `AGENTS.md`, so un-ignore it in the repo's local `.gitignore` (`!AGENTS.md`)
+and verify `git ls-files AGENTS.md` (or `git add -f`). Fold any existing `CLAUDE.md` in rather than
+discarding it.
 
 ### Summarize & hand off to the frontend skill
-Report: the build plan, the schema decisions the user made (`SCHEMA_DECISIONS.md`), what runs today
-(the cheap level-1 analysis — symbol table + resolver call graph — on the fixture), what's stubbed
-(the level-2 framework backend), the **distribution artifacts** (the `codeanalyzer-<lang>` PyPI
-package under `packaging/python/`, the `packaging/homebrew/` formula generator + the
-`codellm-devkit/homebrew-tap` push, the tag-triggered `release.yml`, and the **published package
-name + version**), the analyzer `README.md` and the **`CLAUDE.md` agent guide with its `AGENTS.md`
-symlink** (mirroring `codeanalyzer-typescript/CLAUDE.md`), and the diff summary. Confirm
-the **modularity** checks from `references/analyzer-architecture.md` actually hold (delegating
-`core`, node-kind-split builder, isolated framework subpackage, present-and-wired `analysis/` +
-`frameworks/` layer) — report it as a checklist, not an aspiration.
-
-**Hand-off to cldk-sdk-frontend.** This skill ends at a working, released analyzer. To make the
-language usable from a CLDK SDK, run the **cldk-sdk-frontend** skill next; it consumes exactly what
-you produced here: a sample `analysis.json`, the approved schema contract + `SCHEMA_DECISIONS.md`,
-the CLI contract (`--help`), and the published `codeanalyzer-<lang>` package name + version to pin.
-State these explicitly in the summary so the frontend skill (or a later session) has its inputs.
-
-> **Never fake verification.** Every stage's verify step must actually run. If a required tool
-> is found missing mid-build, stop and instruct the user to install it (exact commands + what
-> it's for) and wait — don't scaffold-and-leave-unverified, and don't claim a stage passed
-> without running it. Full criteria, fixture design rules, and definitions of done:
-> `references/testing-and-validation.md`.
+Report: the two-path choice, the schema decisions (`SCHEMA_DECISIONS.md`), which `max_level` runs
+today and what each level emits (on the fixture, both projections), the distribution artifacts
+(PyPI package + version, Release binaries, brew formula, `release.yml`), the `README.md` and the
+`CLAUDE.md`/`AGENTS.md` guide, and the diff summary. Confirm the **modularity** checks from
+`analyzer-architecture.md` and the **schema gates** from `testing-and-validation.md` actually hold.
+**Hand-off to cldk-sdk-frontend:** the SDK binding is a *separate* major release (`§ c`) — it
+revises the Pydantic models to the v2 schema while keeping the same public API. Hand over a sample
+`analysis.json` (each level), the schema contract + `SCHEMA_DECISIONS.md`, the CLI `--help`, and
+the published package name + version to pin.
+
+> **Never fake verification.** Every level's gate must actually run. If a required tool is found
+> missing mid-build, stop and instruct the user to install it and wait. Full criteria, fixture
+> design, and definitions of done: `references/testing-and-validation.md`.
 
 ## Guardrails
-- **Modularity is a success criterion, not a nicety.** A monolithic analyzer that emits valid
-  `analysis.json` has met the schema bar and *failed* the maintainability bar — both are
-  required. Mirror `codeanalyzer-python`'s package structure (`references/analyzer-architecture.md`):
-  a delegating `core` (never inlined analysis, never a hardcoded `entrypoints: {}`), a cohesive
-  symbol-table builder split by node kind (not a flat pile of free functions), the framework
-  backend isolated in its own subpackage, and a real pluggable layer — `analysis/` (pass +
-  registry) and `frameworks/` (finder base), scaffolded even when the built-in pass list is
-  empty. `codeanalyzer-ts` is the anti-example of every one of these; do not reproduce it.
-- **The schema contract is the success criterion.** An analyzer that runs but emits
-  non-conformant JSON has failed the real job — the SDK can't load it. Mirror the schema
-  **comprehensively** (`schema-reference.md`) and prove it by validating output against the
-  SDK `<Lang>Application` Pydantic model at every level. A thin schema that "looks right" but
-  drops fields is a silent failure.
-- **Expand the schema for the language — that's a feature, not a deviation.** Keep the
-  invariant spine (root keys, Module→Class/Callable nesting, identity-only edges,
-  `signatureOf()`), then add the target language's own node kinds and fields as first-class
-  data rather than forcing it into the Java/Python mold. The contract you design here is what the
-  frontend skill encodes as SDK models, so record every new kind/field in `SCHEMA_DECISIONS.md`.
-  See the expansion rubric in `schema-reference.md`.
-- **Don't fake the call graph.** Identity-only edges must reference signatures that actually
-  exist in the symbol table, produced by the same `signatureOf()`. Dangling edges are worse
-  than no edges.
-- **Scope discipline.** This skill builds the *analyzer* and its distribution — nothing in a CLDK
-  SDK repo. Wiring the analyzer into the Python/TS/… SDKs is **cldk-sdk-frontend**. Enriching an
-  *existing* analyzer with a new contribution point is `codeanalyzer-extension-builder`.
-- **No invented tooling.** If a recommended parser/resolver doesn't exist for the language,
-  say so and fall back per the menu's reasoning (compiler API → tree-sitter + external
-  resolver → Joern), rather than inventing a package name.
-- **Path predicates must operate on relative paths.** Any skip predicate (`IsVendored`,
-  `IsTestFile`, or a custom equivalent) applied to an absolute file path will silently match
-  directory segments from the analyzer's own source tree and discard all files under them.
-  Apply every such predicate to the path relative to the project root — never to the absolute
-  path. This is an invisible failure: the analyzer compiles cleanly, all tests pass on the
-  project, and the symbol table is empty.
-- **Every language-specific schema field needs a test that asserts its value.** Pydantic
-  validation confirms the JSON is structurally well-formed; it does not confirm that
-  language-specific fields are correctly populated. For every field added beyond the
-  Java/Python spine, write at least one test asserting a known concrete value. A field with
-  no value test is guaranteed to break silently when the builder logic changes.
+- **The schema is the success criterion.** An analyzer that runs but emits non-v2 JSON has failed
+  the real job — the SDK can't load it, and the Neo4j graph won't match. Validate output against
+  the SDK `Application` model at every level, in both projections. Mirror the schema
+  **comprehensively** (`schema-reference.md`); a thin schema that "looks right" but drops fields is
+  a silent failure.
+- **Additive, never rewriting.** Each level only *adds* nodes/edges (plus the one `callee`
+  refinement). `L1 ⊆ L2 ⊆ L3 ⊆ L4` is a CI-checkable superset gate. If a "higher" level would
+  rewrite a lower level's fact, the model is wrong — fix the model.
+- **Hold the parity line.** The shared vocabulary (node kinds, edge lists, `can://` grammar) is
+  identical across analyzers; language extras are **additive** and recorded in `SCHEMA_DECISIONS.md`.
+  This is what lets the SDK model the schema once and the Neo4j schema be one contract.
+- **Modularity is a success criterion.** Mirror `codeanalyzer-python`'s structure — delegating
+  `core`, a builder split by node kind, the framework backend and the `neo4j/` projection isolated
+  in their own subpackages, a real pluggable pass layer. `codeanalyzer-ts`'s original monolith is
+  the anti-example.
+- **Two projections, always.** JSON and Neo4j are co-primary. Neo4j is always full-depth; levels
+  gate the JSON path only.
+- **No invented tooling.** If a recommended parser/resolver/oracle doesn't exist for the language,
+  say so and fall back per the menu's reasoning, rather than inventing a package name.
+- **Scope discipline.** This skill builds the *analyzer* and its distribution. Wiring it into the
+  Python/TS/… SDKs is **cldk-sdk-frontend**; enriching an existing analyzer with a contribution
+  point is `codeanalyzer-extension-builder`.
diff --git a/skills/codeanalyzer-backend/references/analyzer-architecture.md b/skills/codeanalyzer-backend/references/analyzer-architecture.md
index 376ab35..ba8cc0f 100644
--- a/skills/codeanalyzer-backend/references/analyzer-architecture.md
+++ b/skills/codeanalyzer-backend/references/analyzer-architecture.md
@@ -28,16 +28,21 @@ codeanalyzer/
   core.py                # ORCHESTRATOR ONLY. Delegates every phase; inlines no analysis logic.
   options/               # CLI option / AnalysisOptions model
   config/                # static / environment config, distinct from CLI options
-  schema/                # the Pydantic (or native) models — the data contract
-  syntactic_analysis/    # symbol-table construction (the per-file builder)
-  semantic_analysis/     # call-graph construction
-    call_graph.py        #   the resolver-based graph + graph<->schema adaptation
+  schema/                # the node/edge models — the v2 data contract (canonical-schema.md)
+  syntactic_analysis/    # L1: the tree builder (per-file, to callable depth) + call nodes
+  semantic_analysis/     # L2: call graph; L3/L4: the dataflow passes (cfg/pdg/sdg)
+    call_graph.py        #   the resolver-based call_graph + graph<->schema adaptation
     <framework>/         #   the heavy framework backend (joern/wala/svf), ISOLATED in its own subpackage
+  neo4j/                 # the CO-PRIMARY projection: project() -> GraphRows -> cypher/bolt + schema catalog
   analysis/              # the PLUGGABLE pass layer (registry + AnalysisPass base)
   frameworks/            # entrypoint-finder base + concrete finders, built ON the pass layer
   utils/                 # logging, progress, fs helpers — no analysis logic
 ```
 
+The `neo4j/` subpackage is **not optional** — the Neo4j graph is a co-primary output
+(`neo4j-projection.md`), so its seam exists in the skeleton like any other, isolated behind
+`project()`.
+
 Not every language needs every box on day one (the framework backend and pass finders may ship
 empty), but the **skeleton and the seams must exist from the start**, because that is what makes
 the analyzer extensible without a rewrite.
diff --git a/skills/codeanalyzer-backend/references/backend-recipe.md b/skills/codeanalyzer-backend/references/backend-recipe.md
index d01d676..a4ae95f 100644
--- a/skills/codeanalyzer-backend/references/backend-recipe.md
+++ b/skills/codeanalyzer-backend/references/backend-recipe.md
@@ -27,10 +27,12 @@ note when they're the *same* tool. TS's checker does both; tree-sitter languages
 need a separate resolver (an LSP or a type checker). This single fact drives steps
 5 and 6.
 
-## 2. Mirror the canonical schema, then extend at the leaves
-Reproduce `Application { symbol_table: Map<path, Module>, call_graph: Edge[] }`, the
-Module → Class/Callable hierarchy, identity-only edges that reference signature strings with
-a provenance tag, and a Callsite that holds the rich per-call metadata. That spine is the
+## 2. Mirror the canonical schema (v2), then extend at the leaves
+Reproduce the **additive tree** of `canonical-schema.md`: `application → symbol_table{module} →
+types{}/functions{} → callables{} → body{}`, with `can://` ids, spans (byte offsets), module-level
+`source`, and the split identity-only edge lists (`call_graph` at the application; `cfg`/`cdg`/
+`ddg`/`summary` on the callable). Edges reference node ids with a `prov` tag; call sites are `call`
+nodes in `body`. That spine is the
 invariant. **Then expand the schema to capture what's idiomatic in the target language as
 first-class data** — add node kinds (interfaces/type-aliases/enums for TS; structs/interfaces
 for Go; traits/impls for Rust), typed fields (receiver types, async/unsafe flags, generics),
@@ -58,25 +60,23 @@ JS/TS reads `tsconfig.json` and ensures `node_modules`. Make it **idempotent**,
 and **degrade to partial types rather than crashing**. Full detail and timing (source-level
 vs bytecode resolvers): `project-materialization.md`.
 
-## 5. Build the structural symbol table (level 1, part 1)
-Walk the parse tree per file and populate Module → {imports, comments, classes,
-interfaces/types/enums, functions, module vars}; each class → methods/properties; each
-callable → params, return type, decorators, locals, spans, raw code, and the **unresolved
-call sites** (callee name + receiver expr + arg exprs + position, with `callee_signature`
-left null). Stamp per-file caching metadata (content hash, mtime, size). This step records
-call sites but doesn't resolve them into edges — that's the cheap next step (still level 1;
-type fields may still be filled here if your resolver is a same-tool checker). Do this
-file-by-file, modeled on how Java's `SymbolTable.extractAll` and Python's `core.py` iterate
-the project — see `symbol-table-construction.md`.
-
-## 6. Build the resolver-based call graph (level 1, part 2 — cheap, strictly additive)
-This is **cheap and part of the level-1 analysis**: the same Tier-1 resolver already loaded for
-the symbol table resolves call sites into edges. For each recorded call site, map the callee to
-a declaration, write its signature into `callee_signature` (**backfilling the site in place**),
-and emit an identity-only edge `source_sig → target_sig` with `provenance` set to your resolver
-(e.g. `"tsc"`). Handle constructors/`new`, method dispatch via receiver type, and an explicit
-unresolved-fallback path (record the site, skip the edge — never crash). Never mutate the symbol
-table beyond filling `callee_signature`.
+## 5. L1 — build the tree (symbol table)
+Walk the parse tree per file and populate the module → `{imports, types, functions}`, each type →
+`callables`/`fields`, each callable → params, return type, decorators, spans (byte offsets), and
+its **`call` nodes in `body`** (callee name + receiver expr + arg exprs + span, `callee: null`).
+Store the file's text once as the module's **`source`** (all node text slices off it). Stamp
+per-file caching metadata (content hash, mtime, size). This step records call sites but doesn't
+resolve them into edges — that's L2. Do it file-by-file, modeled on Java's `SymbolTable.extractAll`
+and Python's `core.py` — see `symbol-table-construction.md`.
+
+## 6. L2 — the resolver-based call graph (cheap, strictly additive)
+The same Tier-1 resolver already loaded for the tree resolves the `call` nodes into edges. For
+each `call` node, map the callee to a declaration, backfill its **`callee`** id in place
+(`null → id`, the one sanctioned mutation), and emit a `call_graph` edge `{src, dst}` (both
+callable ids) with `prov` set to your resolver (e.g. `["tsc"]`). Handle constructors/`new`, method
+dispatch via receiver type, and an explicit unresolved-fallback (record the `call` node, skip the
+edge — never crash). Never mutate the tree beyond filling `callee`. `call_graph` edges are
+**immutable once written** — never re-anchored to a statement at L3.
 
 This base graph comes from the **Tier-1 resolver** and is deliberately lightweight — no
 points-to, dataflow, or k-CFA. *Don't call the tiers "whole-program vs not": once deps are
diff --git a/skills/codeanalyzer-backend/references/canonical-schema.md b/skills/codeanalyzer-backend/references/canonical-schema.md
index 5498fe0..7d3d8aa 100644
--- a/skills/codeanalyzer-backend/references/canonical-schema.md
+++ b/skills/codeanalyzer-backend/references/canonical-schema.md
@@ -1,118 +1,264 @@
-# The canonical CLDK analysis contract
-
-Every CLDK analyzer — Java, Python, and any new language — emits the **same shape** of
-JSON so the SDK facades can parse them interchangeably. A new analyzer that drifts from
-this contract is the single most common way a language pack fails: the analyzer "works"
-in isolation but the Python SDK can't load it. Treat this file as the spec the generated
-analyzer's output must satisfy, and as the source of truth for the Pydantic models you
-add to the SDK.
-
-This file states the *rules*. For the exhaustive, field-by-field spec derived from the SDK
-Pydantic models — the thing the generated analyzer must mirror comprehensively — see
-`schema-reference.md`. The authoritative model code is
-`codeanalyzer-python/codeanalyzer/schema/py_schema.py` (identity-only, recommended) and
-`python-sdk/cldk/models/java/models.py` (legacy, rich-edge).
-
-## The three invariants
-
-1. **One root object, two required keys.** Output is
-   `Application { symbol_table: Map<path, Module>, call_graph: Edge[] }`
-   plus optional `entrypoints`. `symbol_table` is keyed by **file path** (relative to the
-   project root, stable across runs). `call_graph` is a flat list of edges.
-
-2. **Identity-only edges** (for new analyzers). A call-graph edge carries only `source` and
-   `target` — both **signature strings** that must exactly equal a `Callable.signature`
-   already in the symbol table. The rich per-call metadata (receiver expression, argument
-   types, line/column, resolved callee) lives on a `Callsite` inside the **caller's**
-   `call_sites`. This separation is what lets the SDK build a NetworkX graph whose nodes are
-   the symbol-table callables. If `source`/`target` don't byte-match a real signature, the
-   graph has dangling nodes.
-   *Caveat:* the **Java** analyzer is a legacy exception — its `JGraphEdges` embed rich
-   `JMethodDetail` objects instead of bare strings. Do **not** copy that for a new language;
-   follow the Python identity-only model (your recipe's step 2 mandates it). See
-   `schema-reference.md` § "The one design choice".
-
-3. **`signatureOf()` is the linchpin.** Define exactly **one** canonicalizer in the
-   analyzer that turns a declaration into its signature string, and use it everywhere a
-   signature is produced — when naming a `Callable`, when writing `callee_signature` on a
-   `Callsite`, and when emitting edge `source`/`target`. Caller-side and callee-side ids
-   must be produced by the same function so they are identical. Constructors normalize to a
-   single convention (Python uses `ClassName.__init__`; pick the target language's
-   equivalent and apply it consistently). When in doubt, prefer a fully-qualified,
-   human-readable string like `module.Class.method` over an opaque hash — downstream LLM
-   consumers read these.
-
-## JSON conventions (non-negotiable for SDK compatibility)
-
-- **snake_case keys.** Java emits via Gson with
-  `LOWER_CASE_WITH_UNDERSCORES`; Python via Pydantic's snake_case defaults. A new analyzer
-  in any host language must serialize keys in snake_case so the shared SDK models parse it.
-- **`analysis.json` is the only facade-visible artifact.** Whatever the analyzer does
-  internally (caches, intermediate DBs), the contract the SDK depends on is a single
-  `analysis.json` (or compact JSON on stdout when no output dir is given).
-- **Round-trip safety.** Open-vocabulary fields (`provenance`, `tags`, `detection_source`)
-  are plain strings/string-maps so a persisted `analysis.json` loads even if the producing
-  extensions aren't installed. Don't model them as closed enums.
-
-## Core node types
-
-These are the canonical Python field names. For a new language, replicate the **same field
-names and nesting**; add language-specific node kinds rather than renaming the shared ones.
-
-### Module (a compilation unit / file)
-`file_path`, `module_name`, `imports[]`, `comments[]`, `classes{sig→Class}`,
-`functions{sig→Callable}`, `variables[]`, plus caching metadata `content_hash`,
-`last_modified`, `file_size`.
-
-### Class
-`name`, `signature` (e.g. `module.ClassName`), `comments[]`, `code`, `decorators[]`,
-`base_classes[]` (signature strings), `methods{sig→Callable}`, `attributes{name→Attr}`,
-`inner_classes{sig→Class}`, `start_line`, `end_line`.
-
-### Callable (function / method / constructor)
-`name`, `path`, `signature` (e.g. `module.Class.method`), `comments[]`, `decorators[]`,
-`parameters[]`, `return_type`, `code`, `start_line`/`end_line`/`code_start_line`,
-`accessed_symbols[]`, **`call_sites[]`** (the unresolved-then-backfilled call records),
-`inner_callables{}`, `inner_classes{}`, `local_variables[]`, `cyclomatic_complexity`,
-`is_entrypoint`, `entrypoint_framework`.
-
-### Callsite (rich per-call metadata; lives on the caller)
-`method_name`, `receiver_expr`, `receiver_type`, `argument_types[]`, `return_type`,
-**`callee_signature`** (null when the site is first recorded; backfilled in place when the
-resolver call graph is built),
-`is_constructor_call`, and `start_line`/`start_column`/`end_line`/`end_column`.
-
-### CallEdge (identity-only)
-`source` (caller signature), `target` (callee signature), `type` (`"CALL_DEP"`),
-`weight` (int, accumulated when merging backends), `provenance[]` (e.g. `["tsc"]`,
-`["jedi","joern"]`), `tags{}` (free-form, extension-namespaced).
-
-### Entrypoint (optional)
-`signature` (references a Callable), `framework`, `detection_source`
-(`decorator|base_class|url_resolver|...|extension`), plus flat optional route/method
-fields and a free-form `tags{}`.
-
-## Mapping the contract onto a new language
-
-Keep the spine identical; extend at the leaves:
-
-| Canonical concept | TypeScript adds | Go adds |
-| --- | --- | --- |
-| Class | `interface`, `type`-alias, `enum` as sibling node kinds | `struct`, `interface` |
-| Callable | arrow functions, methods, getters/setters | functions, methods (receiver type), closures |
-| `base_classes` | `extends` + `implements` chains | embedded structs / satisfied interfaces |
-| decorators | TS decorators (`@Injectable`) | struct tags (in `tags`) |
-
-When you introduce a new node kind, give it its own `signature` produced by the same
-`signatureOf()`, so edges can point at it.
-
-## How the SDK consumes this
-
-The SDK defines a parallel set of Pydantic models per language under
-`python-sdk/cldk/models/<lang>/models.py` (e.g. `TSApplication`, `TSModule`, `TSCallable`,
-`TSCallEdge`). They must mirror these field names so `Application(**json.load(...))`
-validates. The Java models (`cldk/models/java/models.py`) and the re-exported Python models
-(`cldk/models/python/__init__.py`) are the two worked examples to copy from — copy the one
-whose invocation pattern (subprocess vs in-process) matches your analyzer. The SDK side of
-this — the per-language models and facade — is built by the **cldk-sdk-frontend** skill (its
-`python-sdk-wiring.md`), not here.
+# The canonical CLDK analysis schema (v2) — the keystone
+
+This is the contract every CLDK analyzer emits and every SDK consumes. It is the **single
+source of truth** for this skill: the analyzer you build (or migrate) exists to produce this
+shape, in **two projections** — `analysis.json` and a Neo4j graph — and the SDK models mirror
+it. `schema-reference.md` is the field-by-field appendix; this file states the model.
+
+## The one idea: an additive analysis paradigm
+
+> **Codeanalyzer is an additive analysis paradigm: each analysis level is the same tree grown
+> one layer deeper, plus one edge family over the new layer.**
+
+There is exactly **one structure** — a tree of nodes with typed edges laid over it (a Code
+Property Graph). Every "section" anyone has ever named — symbol table, call graph, CFG, PDG,
+SDG, taint — is a **projection of that one structure**, not a separate thing. Analysis
+**levels** are how deeply the structure is populated; they only ever *add*, never rewrite.
+
+### The atom
+
+One **scale-free node** — a region of code — is the whole vocabulary. A `file`, a `struct`, a
+`method`, a `statement` are not different kinds of thing; they are the same node at different
+granularity. Every node has:
+
+- an **`id`** (see § Identity),
+- a **`kind`** (the node-kind ladder below),
+- a **`span`** (the one universal attribute — where in source it lives),
+- **children** (containment), and
+- **edges** (typed overlays).
+
+### Two relations, and only two
+
+1. **Containment** — the **single-parent** relation. Every node has exactly one parent (the
+   root `application` excepted). *Exactly one* is what makes it a **tree**, and what
+   distinguishes it from the overlays.
+2. **Typed edges** — the multi-valued overlays: `call_graph`, `cfg`, `cdg`, `ddg`, `param_in`,
+   `param_out`, `summary`. A node has one parent but many edges.
+
+A node + containment tree + typed edge overlays **is a CPG.** Hold this and the rest follows.
+
+## The hierarchy (named-map containment)
+
+Containment above the callable is expressed as **named maps** — the classic symbol-table
+shape, keyed for lookup. The tree grows *downward* as the level rises:
+
+```
+application                                   ← the root; carries an id
+  symbol_table: { <file>: module }            ← L1
+    module: { types{}, functions{} }          ← L1  (per-file / compilation-unit container)
+      type: { callables{}, fields{} }         ← L1
+        callable: { body{}, cfg[], cdg[], ddg[], summary[] }
+          body: { <local-id>: node }          ← L3+ (statements, then synthetic vertices)
+  call_graph[], param_in[], param_out[]       ← cross-function edges, at the application scope
+```
+
+- **Above the callable**: name-keyed maps (`types`, `functions`, `callables`) — each node
+  carries its full `id`.
+- **At the callable**: `body` is the container that grows at L3+. It is a map keyed by the
+  node's **local id** (a source position `line:col`, or an `@tag` for synthetic vertices).
+- **Edges live at the lowest common ancestor of their endpoints**: intra-callable edges
+  (`cfg`/`cdg`/`ddg`/`summary`) hang on the callable; cross-callable edges
+  (`call_graph`/`param_in`/`param_out`) hang on the application.
+
+### Node-kind ladder
+
+```
+application → file/module → type (class|struct|interface|enum|…) → callable (function|method|constructor)
+            → statement (statement|call|return|branch|loop|…) → [expression, opt]
+```
+
+plus the **synthetic** vertices introduced at L4: `entry`, `exit`, `formal_in`, `formal_out`,
+`actual_in`, `actual_out`. A node is therefore *either an AST region or a synthetic analysis
+vertex*; both fit the tree (synthetic vertices are children of the callable or of a call-site
+statement).
+
+## Identity
+
+Two tiers, and the boundary is the **callable leaf line** — the same line where L1 stops.
+
+- **Durable ids (≥ callable)** — files, modules, types, callables get stable
+  [`cldk://`-style](../../cldk-sdk-frontend/references/schema-contract.md) URIs that survive
+  re-analysis and are what external tools (SCIP export, cross-repo joins) address. The grammar
+  is a **containment path** with an application segment so multiple apps in one language don't
+  collide:
+
+  ```
+  can://<lang>/<app>/<file>/<type>/<callable-signature>
+  can://go/myapp/src/util.go/Hasher/Hash(string)uint64
+  ```
+
+- **Ordinal ids (< callable)** — statements and synthetic vertices are addressed *within* their
+  callable by a **source position** (real nodes) or an **`@tag`** (synthetic):
+
+  ```
+  <callable-id>@<line>:<col>          e.g. …/Hash(string)uint64@15:2      (a statement)
+  <callable-id>@<tag>                  e.g. …/Hash(string)uint64@entry     (synthetic)
+                                            …@formal_in:0, …@16:2/actual_in:0
+  ```
+
+  Positions are addressable (the SDK can expose `flows_to_statement("util.go:42")` as a
+  line-level query) and unique within a single analysis when they carry `line:col` — a bare
+  line is **not** unique (`if err != nil { return err }`), so always keep the column. These are
+  content-stable within one run; they are **not** promised across edits (analysis is recomputed
+  wholesale, so cross-edit durability is a non-goal below the callable line).
+
+The delimiters `/`, `@`, `:` are chosen to not collide with the durable `#`/symbol grammar; the
+`can://` scheme and the app segment are extensions to be kept in lockstep with the SDK's
+`schema-contract.md` and the upstream `cldk://` RFC.
+
+## The levels (what each one grows)
+
+The levels are progressive population of the one tree, each additive over the last —
+`L1 ⊆ L2 ⊆ L3 ⊆ L4`, superset **modulo null-refinement** (see § Monotonicity). Depth grows only
+at **L1 and L3**; L2 adds only edges; L4 adds synthetic vertices + edges.
+
+| Level | Grows (nodes) | Adds (edges) | Cost / substrate | Flag |
+| --- | --- | --- | --- | --- |
+| **1** | the tree to **callable** depth, + `call` nodes in `body` (call sites, `callee` unresolved) | — | cheap, parser + resolver | `-a 1` |
+| **2** | `callee` on each `call` node backfilled (`null → id`) | `call_graph` (callable → callable) | cheap | `-a 2` |
+| **3** | the **rest of `body`** (non-call statements) under each callable | `cfg`, `cdg`, `ddg` (**syntactic**) — all intra-callable | heavy, **AST-only, per-callable parallel** | `-a 3` |
+| **4** | **synthetic** param vertices (formal/actual in-out) | `param_in`, `param_out`, `summary`, + `ddg` (**semantic**, alias-aware) | heaviest: **needs the points-to oracle** + summary fixpoint | `-a 4` |
+
+`body` therefore begins at **L1** holding just the `call` nodes (so `get_call_sites` is an L1
+accessor, preserving the old SDK surface) and completes at **L3** with the remaining statements.
+Depth grows at L1 (tree + call sites) and L3 (full body); L2 is a pure refinement + edge add; L4
+adds synthetic vertices + cross edges.
+
+`-a 3` implies `-a 2`; `-a 4` implies `-a 3`. Framework enrichment (Joern/WALA) and points-to
+precision are an **orthogonal axis** — provenance-merged evidence into an existing edge family,
+**not** a level. `max_level` in the payload declares which level was populated; consumers
+**read it** rather than sniffing for keys.
+
+### Edge families and their placement
+
+| Edge list | Level | Endpoints | Lives on | Notes |
+| --- | --- | --- | --- | --- |
+| `call_graph` | 2 | callable → callable | application | `prov[]`, `weight`; immutable once written (never re-anchored to a statement) |
+| `cfg` | 3 | statement → statement | callable | `kind`: `fallthrough`\|`true`\|`false`\|`switch_case`\|`loop_back`\|`exception`\|`return`\|… |
+| `cdg` | 3 | statement → statement | callable | control dependence (from post-dominance) |
+| `ddg` | 3→4 | statement → statement | callable | `var` (k-limited access path), `prov`: `["ssa"]` = **syntactic** (L3), `["points-to"]` = **semantic** (L4) |
+| `summary` | 4 | actual_in → actual_out (same call) | callable | transitive intra-caller shortcut |
+| `param_in` | 4 | actual_in → formal_in | application | argument into callee |
+| `param_out` | 4 | formal_out → actual_out | application | result back to caller |
+
+Each list is keyed **by its type** (the list name *is* the type; no `type` field). Every edge
+record is `{ src, dst, …attrs }` referencing node ids. **No dangling endpoints** — every `src`
+and `dst` must resolve to a node in the tree (the same invariant at every level).
+
+## Source and text: one blob per module, everything slices off it
+
+The tree carries structure; source **text** is stored **once per file, on the module node**, as
+`source`, and every node's text is a **slice** of it:
+
+- `get_method_body(sig)` → `module.source[callable.span.bytes]`
+- a statement's text, a call's receiver expression → the same slice by its node span.
+
+To make slicing O(1), **spans carry byte/char offsets** alongside `line:col` (`line:col` to
+address and display, offsets to slice). This is the minimum-size, self-contained choice — one
+copy of each source file, zero per-node duplication, and it subsumes any per-callable `code`
+field.
+
+## Monotonicity (the invariant that makes "additive" true)
+
+Levels **add** facts; they never contradict or delete. Exactly two sanctioned changes:
+
+1. **Additive** — new nodes deeper in the tree, new entries in an edge list.
+2. **Refinement** — an unresolved fact becoming resolved: `callee` on a call node `null → id`.
+   Null-to-value only, never value-to-different-value.
+
+So `analysis.json(-a 1) ⊆ … ⊆ analysis.json(-a 4)`, a **CI-checkable superset gate**. The one
+subtlety is the DDG: L3 emits the **syntactic** (name-equality, no-alias) def-use — a strict
+subset — and L4 **adds** the alias-derived edges via points-to. This holds *because* the
+precision posture is weak-update / over-approximate (no strong updates through aliases); a
+strong update would remove an edge and break the chain. The `prov` tag (`ssa` vs `points-to`)
+makes the syntactic/semantic split visible in the data.
+
+## Conventions
+
+- **snake_case keys**, everywhere, in every host language (Gson `LOWER_CASE_WITH_UNDERSCORES`,
+  Pydantic defaults) so one set of SDK models parses every analyzer.
+- **A fact is present or absent — there is no `null`** (except the sanctioned `callee: null`
+  refinement slot). Absence *is* the "no fact" encoding; do not emit empty-vs-null noise.
+- **`analysis.json` is one facade-visible artifact** (or compact JSON on stdout); the Neo4j
+  graph is the co-primary projection (below). Caches/DBs are internal.
+- Open-vocabulary fields (`prov`, `tags`) are plain strings so a persisted payload loads even
+  without the producing extension installed.
+
+## Two projections of the one structure
+
+The same tree + overlays is emitted two ways; they must agree.
+
+- **`analysis.json`** — this document: named-map tree, `body` maps, split edge lists,
+  `source` per module. The facade contract.
+- **Neo4j** — a near-identity projection (`neo4j-projection.md`): every node → a node row,
+  **containment → typed `HAS_*`/`DECLARES` edges** (the tree rendered as edges, since a graph DB
+  has no nesting), every overlay edge → a typed relationship. Node families and the `--app-name`
+  anchor must match this schema. The Neo4j graph is **always full-depth** — analysis levels gate
+  the JSON path only.
+
+Building **both** is a first-class deliverable for every analyzer (`§ a` of the skill), not an
+afterthought.
+
+## Worked example (L1 → L4, additive)
+
+```jsonc
+{
+  "schema_version": "2.0.0", "language": "go", "max_level": 4, "k_limit": 3,
+  "application": {
+    "id": "can://go/myapp", "kind": "application",
+    "symbol_table": {
+      "src/util.go": {                                                            // L1
+        "id": "can://go/myapp/src/util.go", "kind": "module", "package": "util",
+        "source": "package util\n\nimport \"hash/fnv\"\n\nfunc (h Hasher) Hash(s string) uint64 {\n\th := fnv.New64()\n\th.Write([]byte(s))\n\treturn h.Sum64()\n}\n",
+        "types": {
+          "Hasher": {
+            "id": "can://go/myapp/src/util.go/Hasher", "kind": "struct",
+            "span": { "start":[10,1], "end":[40,1], "bytes":[0,400] },
+            "callables": {
+              "Hash(string)uint64": {
+                "id": "can://go/myapp/src/util.go/Hasher/Hash(string)uint64", "kind": "method",
+                "span": { "start":[14,1], "end":[22,1], "bytes":[42,180] },
+                "body": {                                                          // L3+
+                  "@entry": { "kind":"entry" },
+                  "15:2":   { "kind":"statement", "span":{ "start":[15,2],"end":[15,18],"bytes":[84,100] } },
+                  "16:2":   { "kind":"call", "span":{...}, "callee":"can://go/myapp/src/fnv.go/New64()" },
+                  "17:2":   { "kind":"return", "span":{...} },
+                  "@exit":  { "kind":"exit" },
+                  "@formal_in:0":     { "kind":"formal_in",  "of":"s" },           // L4
+                  "@formal_out":      { "kind":"formal_out", "of":"$ret" },        // L4
+                  "16:2/actual_in:0": { "kind":"actual_in",  "of":"arg0", "parent":"16:2" },  // L4
+                  "16:2/actual_out":  { "kind":"actual_out", "of":"$ret", "parent":"16:2" }
+                },
+                "cfg": [ {"src":"@entry","dst":"15:2","kind":"fallthrough"},       // L3
+                         {"src":"15:2","dst":"16:2","kind":"fallthrough"} ],
+                "cdg": [ {"src":"@entry","dst":"15:2"} ],                          // L3
+                "ddg": [ {"src":"15:2","dst":"17:2","var":"h","prov":["ssa"]},     // L3 syntactic
+                         {"src":"16:2","dst":"17:2","var":"h","prov":["points-to"]} ], // L4 semantic
+                "summary": [ {"src":"16:2/actual_in:0","dst":"16:2/actual_out"} ]  // L4
+              }
+            }
+          }
+        },
+        "functions": {}
+      }
+    },
+    "call_graph": [ {"src":"can://go/myapp/src/util.go/Hasher/Hash(string)uint64",  // L2
+                     "dst":"can://go/myapp/src/fnv.go/New64()","prov":["go/types"],"weight":1} ],
+    "param_in":  [ {"src":"…/Hash(string)uint64@16:2/actual_in:0","dst":"…/New64()@formal_in:0"} ], // L4
+    "param_out": [ {"src":"…/New64()@formal_out","dst":"…/Hash(string)uint64@16:2/actual_out"} ]     // L4
+  }
+}
+```
+
+Every level only *added* — a key, `body` nodes, or edge entries. Nothing was rewritten except
+the `callee: null → id` backfill. That is the additive paradigm made literal.
+
+## Cross-language parity clause
+
+The **vocabulary is shared; language extras are additive.** Node `kind`s, edge list names, edge
+`kind`/`prov` values, and the shapes above are identical across analyzers. A language **adds**
+kinds (Go `defer_resume` CFG edges, Rust `unsafe` flags, TS `interface`/`enum` types) — recorded
+in its `.claude/SCHEMA_DECISIONS.md` — but must **never rename or repurpose** a shared name. This
+is what lets the SDK model the schema **once** (one `Node`, one `Edge`, one `Application`), and
+what lets the Neo4j schema be a single versioned contract. Hold the parity line, or the whole
+one-model premise collapses.
diff --git a/skills/codeanalyzer-backend/references/dataflow-graphs.md b/skills/codeanalyzer-backend/references/dataflow-graphs.md
index e039059..b73b181 100644
--- a/skills/codeanalyzer-backend/references/dataflow-graphs.md
+++ b/skills/codeanalyzer-backend/references/dataflow-graphs.md
@@ -78,11 +78,19 @@ Cross-function edges (`CALL`, `PARAM_IN/OUT`, `SUMMARY`) reference both endpoint
 referenced signature exists in the symbol table, every referenced node_id exists in that
 function's emitted graph.
 
-## Emission — `analysis.json` sections and flags
-
-Graphs are emitted as an **optional top-level section**, present from level 3, preserving the
-facade invariant that `analysis.json` is the single facade-visible output. The `functions` map
-(CFG + PDG) is the **level-3** payload; `sdg_edges` is added at **level 4**:
+## Emission — where the graphs live in the tree
+
+> **Schema v2 supersedes the standalone `program_graphs` section below.** In the canonical schema
+> (`canonical-schema.md`), dataflow is **not** a separate top-level object — it grows *inside the
+> tree*: each callable gains a `body{}` map of statement/vertex nodes plus the intra-callable edge
+> lists `cfg`/`cdg`/`ddg`/`summary`, and the application gains the cross-callable `param_in`/
+> `param_out` lists. Node endpoints are `can://…@line:col` ids, not `(signature, node)` pairs.
+> Read `canonical-schema.md` for the authoritative shape; the ladder, gates, and construction
+> stages in this file are shape-agnostic and still govern. The block below is retained only as the
+> conceptual node/edge inventory (kinds, `cfg`/`pdg`/`sdg` families) — map it onto the v2 tree.
+
+Historically graphs were a top-level `program_graphs` object; the families (CFG, PDG = CDG+DDG,
+SDG) and their level assignment are unchanged, only their placement:
 
 ```jsonc
 {
diff --git a/skills/codeanalyzer-backend/references/neo4j-projection.md b/skills/codeanalyzer-backend/references/neo4j-projection.md
index af3889f..9e47ab6 100644
--- a/skills/codeanalyzer-backend/references/neo4j-projection.md
+++ b/skills/codeanalyzer-backend/references/neo4j-projection.md
@@ -1,17 +1,23 @@
-# Neo4j projection (optional second output surface)
-
-Every mature CLDK analyzer now emits **two** projections of the same analysis: the canonical
-`analysis.json` (the facade contract, always built) and an **optional Neo4j graph**. The graph
-is not an ingestion of `analysis.json` — it is an **alternative projection of the same in-memory
-IR** (the symbol table + call graph objects), selected by `--emit neo4j`. `analysis.json`
-remains the SDK's default contract; the graph is a queryable, incrementally-updatable second
-surface. Java, Python, and TypeScript analyzers all ship this; a new analyzer should mirror it
-**once level-1 JSON is solid** — treat it as part of the *CLI, caching/incremental, packaging*
-stage, not a prerequisite.
-
-Neo4j is **optional at every layer**: the driver is a lazy/optional dependency (Python/TS import
-it on demand; Java loads it reflectively so GraalVM `native-image` can prune it), and nothing in
-the JSON path depends on it.
+# Neo4j projection (the co-primary output surface)
+
+Every CLDK analyzer emits **two projections of the one structure** (`canonical-schema.md`): the
+`analysis.json` tree and a **Neo4j graph**. They are **co-primary** — building both is a
+first-class deliverable, not an afterthought. The graph is not an ingestion of `analysis.json` —
+it is a projection of the **same node tree + edge overlays**, selected by `--emit neo4j`.
+`analysis.json` is the SDK's default contract; the graph is the queryable, incrementally-updatable
+surface. Java, Python, and TypeScript analyzers all ship it.
+
+**Containment renders as edges.** A graph DB has no nesting, so the schema's containment tree
+becomes typed `HAS_*`/`DECLARES` relationships (the `HAS_MODULE`/`DECLARES`/`HAS_CALLABLE`/
+`HAS_CFG_NODE` families below), while the overlay edges (`call_graph`, `cfg`, `cdg`, `ddg`,
+`param_in`/`param_out`, `summary`) become their own typed relationships. Node labels are the v2
+node **kinds**; the `can://` id is the merge key. This is a **near-identity** projection of the
+JSON tree — the same nodes and edges, rendered as a property graph.
+
+Neo4j stays **optional at run time** (you don't need a running DB to emit `analysis.json`): the
+driver is a lazy/optional dependency (Python/TS import it on demand; Java loads it reflectively so
+GraalVM `native-image` can prune it). "Co-primary" means *the analyzer must be able to produce it*,
+not that every run does.
 
 ## CLI surface (add to `cli-contract.md`)
 
diff --git a/skills/codeanalyzer-backend/references/schema-design-loop.md b/skills/codeanalyzer-backend/references/schema-design-loop.md
index 87a6af1..8170e74 100644
--- a/skills/codeanalyzer-backend/references/schema-design-loop.md
+++ b/skills/codeanalyzer-backend/references/schema-design-loop.md
@@ -1,10 +1,14 @@
 # Schema design as a comparison-and-differentiation loop
 
-Designing the analyzer's schema is **not** "copy Java, bolt on a few fields." It is an
-iterative, reflective process: you anchor on the **mature reference analyzers** (currently
-**Java** and **Python**; more languages will join as they mature), interrogate how the target
-language genuinely differs, and — crucially — **bring every divergence to the user as a
-decision** rather than choosing silently. You do this **node by node**, not all at once.
+The **shared spine is already designed** — it's the v2 keystone (`canonical-schema.md`): the node
+tree, the `can://` ids, the additive levels, the edge families. This loop is **not** re-designing
+that; it is confirming the **language-specific expansion** — which `type`/`callable`/`body` kinds,
+which `cfg`-edge kinds, and which typed fields this language adds to the spine (the parity clause:
+add at the leaves, never rename the shared vocabulary). You anchor on the **keystone plus the
+mature reference analyzers** (**Java** and **Python**), interrogate how the target language
+genuinely differs, and — crucially — **bring every divergence to the user as a decision** rather
+than choosing silently. You do this **node by node**, not all at once, recording each answer in
+`.claude/SCHEMA_DECISIONS.md`.
 
 This loop only *designs the schema* (the analyzer-side types + the SDK `<L>` Pydantic models).
 Actually walking files to fill the table is a separate stage — see
diff --git a/skills/codeanalyzer-backend/references/schema-migration.md b/skills/codeanalyzer-backend/references/schema-migration.md
new file mode 100644
index 0000000..e838b2c
--- /dev/null
+++ b/skills/codeanalyzer-backend/references/schema-migration.md
@@ -0,0 +1,87 @@
+# Migrating an existing analyzer to schema v2 (path B)
+
+For a `codeanalyzer-<lang>` that already exists on the **old** schema (flat
+`symbol_table: {path → CompilationUnit}` + a `call_graph` of rich or identity edges, per-callable
+`code`, `is_*` boolean flags). Moving it to the v2 keystone (`canonical-schema.md`) is a
+**major release**: the parsing/resolution guts stay; the *emission layer* is rewritten to produce
+the additive tree + typed edges, in both projections. This is a breaking output change — bump the
+major version and coordinate the SDK release (`§ c`).
+
+**Golden rule:** keep everything that *computes* facts (the parser, the resolver, WALA/Jelly/
+go-ssa, the call-graph builder); replace only what *serializes* them. The analyzer already knows
+the facts — v2 is a different shape for the same facts, plus deeper ones at L3/L4.
+
+## Do it level by level, lowest first
+
+Migrate in the same additive order you'd build a new analyzer, so each step is independently
+validatable against the v2 SDK models:
+
+1. **L1 emission** — the tree + `source` + ids. The biggest structural change; do it first and
+   get the symbol-table gate green before touching edges.
+2. **L2 emission** — the `call_graph` list at application scope.
+3. **Neo4j projection** — re-point (or add) the graph emitter at the v2 node/edge families.
+4. **L3/L4** — if the analyzer already computes dataflow (e.g. Java via WALA's slicer, which
+   already emits `program_graphs`), remap it into `body` + the split edge lists; otherwise it's new
+   construction per `dataflow-construction.md`.
+
+## Field-by-field: old → v2
+
+### Root envelope
+| Old | v2 |
+| --- | --- |
+| `{ symbol_table, call_graph }` (two top-level keys) | `{ schema_version, language, max_level, application: { id, symbol_table, call_graph, param_in, param_out } }` |
+| — (no version) | `schema_version: "2.0.0"`, `max_level` (authoritative) |
+| — (no app identity) | `application.id = can://<lang>/<app>` — **new**, disambiguates apps |
+
+### Container / symbol nodes
+| Old | v2 |
+| --- | --- |
+| `symbol_table[path]` = `CompilationUnit`/`Module` | `symbol_table[path]` = `module` node with `id`, `kind:"module"`, **`source`** (whole file, once) |
+| `Type` with `is_interface`/`is_enum`/`is_record`/… booleans | one `type` node with a single **`kind`** (`class`\|`interface`\|`enum`\|`struct`\|…) |
+| `CallSite.is_public/is_private/is_protected` booleans | one `access` field (or on the node) |
+| flat-string `annotations[]` | structured `decorators[]` (`{name,args,span}`) |
+| `thrown_exceptions[]` (Java) | generalized `error_channel[]` |
+| per-callable `code` string | **dropped** — `get_method_body` slices `module.source[callable.span.bytes]` |
+| `start_line`/`end_line`/`start_column`/`end_column` (flat ints) | `span: { start:[l,c], end:[l,c], bytes:[from,to] }` — **add byte offsets** for O(1) slicing |
+
+### Edges (the biggest semantic change)
+| Old | v2 |
+| --- | --- |
+| Java **rich edges** (`JGraphEdges` embedding `JMethodDetail`) | **identity-only**: `call_graph: [{ src, dst, prov, weight }]` — ids only; join detail via id |
+| identity edges `{ source, target, type:"CALL_DEP", provenance }` | rename keys → `{ src, dst, prov }`; `call_graph` list at application scope |
+| call graph mixed granularity | `call_graph` is **callable→callable** and immutable; call-site-level linking is L4 `param_*` |
+| `program_graphs.functions[sig].cfg.nodes` + `sdg_edges` (Java today) | move nodes into the callable's **`body{}`**; split edges into `cfg`/`cdg`/`ddg`/`summary` (intra) + `param_in`/`param_out` (cross); endpoints become `can://…@line:col` ids |
+| `data_dependence: "no-heap"\|"full"` (Java) | this **is** the syntactic/semantic DDG split — emit as `ddg` edges tagged `prov:["ssa"]` (no-heap, L3) vs `prov:["points-to"]` (full, L4) |
+
+### Identity
+| Old | v2 |
+| --- | --- |
+| `signature` string as the id | keep `signature` as the callable's human-readable field, but the **`id`** is the full `can://<lang>/<app>/<file>/<type>/<signature>` path |
+| `(signature, node)` pair for graph endpoints (Java) | single string id `…<signature>@<line>:<col>` (or `@tag` for synthetic) |
+| bare `signatureOf()` | unchanged — still the one canonicalizer; it now produces the *last path segment* of the id |
+
+## Practical mechanics
+
+- **Wrap, don't rewrite, the model layer.** If the analyzer builds in-memory model objects then
+  serializes, add a **v2 emitter** that walks those same objects and produces the new shape — the
+  cleanest diff, and it lets you keep the old emitter behind a flag during transition if useful.
+- **Byte offsets:** the parser already has token positions; thread the byte/char offset through to
+  `span.bytes`. This is the one genuinely new datum L1 needs.
+- **`source`:** you're already reading each file — retain its text on the module node instead of
+  slicing per-callable `code`.
+- **Neo4j:** if the analyzer already has a graph projection (Java/Python/TS do), it's largely a
+  **relabel** to the v2 node/edge families and id scheme; if not, add the `neo4j/` subpackage per
+  `neo4j-projection.md`.
+- **Validate against the SDK v2 models at each level** — the same gates as a new analyzer
+  (`testing-and-validation.md`), plus a **superset check** if you keep the old emitter: v2 output
+  must contain every fact the old output did (modulo the deliberate drops above).
+
+## Release & coordination
+
+- **Major version bump** on the analyzer; note the breaking output change in the release notes
+  (Keep-a-Changelog *Changed/Breaking*).
+- **Coordinate with the SDK release (`§ c`):** the frontend skill revises the Pydantic models to v2
+  in lockstep, keeping the public API stable. Pin the analyzer version in the SDK only once both
+  are cut. Until then, the SDK's old models won't parse v2 output — don't publish the analyzer's
+  new major as the SDK's pinned version prematurely.
+- Update the repo's **`CLAUDE.md`** to describe the v2 model (it's now what the analyzer emits).
diff --git a/skills/codeanalyzer-backend/references/schema-reference.md b/skills/codeanalyzer-backend/references/schema-reference.md
index 83a59cf..9bd595e 100644
--- a/skills/codeanalyzer-backend/references/schema-reference.md
+++ b/skills/codeanalyzer-backend/references/schema-reference.md
@@ -1,204 +1,118 @@
-# Comprehensive schema reference (derived from the SDK Pydantic models)
+# Schema reference (v2) — per-kind fields and edges
 
-This is the **field-by-field** spec the generated analyzer's `analysis.json` must satisfy. It
-is derived from the CLDK Python SDK's own Pydantic models — the code that will actually parse
-your analyzer's output — so it is the authoritative source, not a paraphrase:
+The field-by-field appendix to `canonical-schema.md`. That file states the model (the additive
+tree + overlays); this one enumerates every node kind's fields and every edge list's shape, so
+the analyzer emits them comprehensively and the SDK models them exactly. Every node shares the
+**common node fields**; each kind adds its own. Absent = no fact (no `null`, except the one
+sanctioned `callee` slot).
 
-- **Identity-only / recommended** model: `codeanalyzer-python/codeanalyzer/schema/py_schema.py`,
-  re-exported by `python-sdk/cldk/models/python/__init__.py`.
-- **Legacy / rich-edge** model: `python-sdk/cldk/models/java/models.py`.
+## Common node fields (every node)
 
-> **Mirror it comprehensively.** Reproduce **every** field below for the shared nodes — not a
-> convenient subset. Fields you can't populate yet should still exist with sensible defaults
-> (empty list, `-1` line numbers, `None`) so the SDK model validates and later passes can fill
-> them. Then add the target language's own node kinds.
-
-## The one design choice: edge model
-
-The two reference analyzers diverge on call-graph edges. **New analyzers must use the
-identity-only (Python) model** — your recipe's step 2 mandates it, and it's what keeps edges
-cheap and the graph's nodes equal to the symbol-table callables.
-
-- **Identity-only (use this):** `call_graph: List[CallEdge]`, where an edge's `source`/`target`
-  are bare **signature strings** that exactly match a `Callable.signature` in the symbol table.
-  Rich per-call data lives on `Callsite.callee_signature` inside the caller.
-- **Rich-edge (Java legacy — do NOT copy for new languages):** `JGraphEdges.source`/`target`
-  are `JMethodDetail` objects embedding `klass` + a full `JCallable`. This is heavier and
-  duplicates symbol-table data. Documented here only so you recognize and avoid it.
-
-## Root object
-
-**Recommended (identity-only):**
-| field | type | notes |
-| --- | --- | --- |
-| `symbol_table` | `Dict[str, Module]` | keyed by file path (stable, relative to project root) |
-| `call_graph` | `List[CallEdge]` | identity-only edges; empty `[]` for a symbol-table-only run (`-a 1`) |
-| `entrypoints` | `Dict[str, List[Entrypoint]]` | optional; default `{}` |
-
-*Java additionally carries `version: str` and `system_dependency_graph: List[JGraphEdges]`, and
-its `call_graph` is `None` (absent) for a symbol-table-only run. New languages: prefer `call_graph: []` over
-`None`, and only add a `version`/SDG field if you actually produce them.*
-
-## Module (compilation unit / file)
-| field | type | default |
-| --- | --- | --- |
-| `file_path` | `str` | — |
-| `module_name` | `str` | — (Java uses `package_name`) |
-| `imports` | `List[Import]` | `[]` |
-| `comments` | `List[Comment]` | `[]` |
-| `classes` | `Dict[str, Class]` | `{}` (Java: `type_declarations`) |
-| `functions` | `Dict[str, Callable]` | `{}` (top-level/module functions) |
-| `variables` | `List[VariableDeclaration]` | `[]` |
-| `content_hash` | `Optional[str]` | `None` — caching metadata (step 8) |
-| `last_modified` | `Optional[float]` | `None` |
-| `file_size` | `Optional[int]` | `None` |
-
-## Class / Type
-| field | type | default |
-| --- | --- | --- |
-| `name` | `str` | — |
-| `signature` | `str` | e.g. `module.ClassName` (from `signatureOf()`) |
-| `comments` | `List[Comment]` | `[]` |
-| `code` | `str \| None` | `None` |
-| `decorators` | `List[Decorator]` | `[]` (Java: `annotations: List[str]`) |
-| `base_classes` | `List[str]` | `[]` (Java splits `extends_list` + `implements_list`) |
-| `methods` | `Dict[str, Callable]` | `{}` (Java: `callable_declarations`) |
-| `attributes` | `Dict[str, ClassAttribute]` | `{}` (Java: `field_declarations: List[JField]`) |
-| `inner_classes` | `Dict[str, Class]` | `{}` |
-| `start_line` / `end_line` | `int` | `-1` |
-
-*Java type-kind flags worth carrying as language node-kind info: `is_interface`,
-`is_enum_declaration`, `is_record_declaration`, `is_annotation_declaration`, `is_inner_class`,
-`is_nested_type`, `is_entrypoint_class`, plus `enum_constants`, `record_components`,
-`initialization_blocks`.*
-
-## Callable (function / method / constructor)
-| field | type | default |
-| --- | --- | --- |
-| `name` | `str` | — |
-| `path` | `str` | file path of the declaration |
-| `signature` | `str` | e.g. `module.Class.method` — **the edge id** |
-| `comments` | `List[Comment]` | `[]` |
-| `decorators` | `List[Decorator]` | `[]` (Java: `annotations`, `modifiers`) |
-| `parameters` | `List[CallableParameter]` | `[]` |
-| `return_type` | `Optional[str]` | `None` |
-| `code` | `str \| None` | `None` |
-| `start_line` / `end_line` / `code_start_line` | `int` | `-1` |
-| `accessed_symbols` | `List[Symbol]` | `[]` (Java: `accessed_fields`, `referenced_types`) |
-| `call_sites` | `List[Callsite]` | `[]` — **recorded during symbol-table build, callees backfilled when the resolver call graph runs** |
-| `inner_callables` | `Dict[str, Callable]` | `{}` |
-| `inner_classes` | `Dict[str, Class]` | `{}` |
-| `local_variables` | `List[VariableDeclaration]` | `[]` (Java: `variable_declarations`) |
-| `cyclomatic_complexity` | `int` | `0` |
-| `is_entrypoint` | `bool` | `False` |
-| `entrypoint_framework` | `Optional[str]` | `None` |
-
-*Java extras: `is_constructor`, `is_implicit`, `thrown_exceptions`, `declaration`,
-`crud_operations`, `crud_queries`. Carry constructor-ness for any language (you need it for
-the `new`/`__init__` normalization).*
-
-## Callsite (rich per-call metadata, on the caller)
-| field | type | default |
+| Field | Type | Notes |
 | --- | --- | --- |
-| `method_name` | `str` | — |
-| `receiver_expr` | `Optional[str]` | `None`/`""` |
-| `receiver_type` | `Optional[str]` | `None` |
-| `argument_types` | `List[str]` | `[]` |
-| `return_type` | `Optional[str]` | `None` |
-| `callee_signature` | `Optional[str]` | **`None` when recorded; filled in place when the resolver call graph is built** |
-| `is_constructor_call` | `bool` | `False` |
-| `start_line`/`start_column`/`end_line`/`end_column` | `int` | `-1` |
-
-*Java adds `argument_expr`, `is_static_call`/`is_private`/`is_public`/`is_protected`/
-`is_unspecified`, `crud_operation`, `crud_query`, and a `comment`.*
-
-## CallEdge (identity-only — the model to use)
-| field | type | default |
+| `id` | string | Durable `can://…` path (≥ callable) or `…@line:col` / `…@tag` (< callable). See `canonical-schema.md` § Identity. |
+| `kind` | string | The node-kind (below). Closed vocabulary + additive language kinds. |
+| `span` | `{ start:[line,col], end:[line,col], bytes:[from,to] }` | `line:col` addresses/displays; `bytes` slices `module.source`. Absent on some synthetic nodes. |
+| `parent` | string | **Only** when the container ≠ the enclosing node (synthetic actuals → their call site; materialized blocks/exprs). Implicit otherwise. |
+
+## Root and container kinds
+
+### `application`
+| Field | Type | Level | Notes |
+| --- | --- | --- | --- |
+| `id` | string | 1 | `can://<lang>/<app>` — the app segment disambiguates apps in one language. |
+| `symbol_table` | `{ file → module }` | 1 | Named map, keyed by relative file path (no absolute, no `..`). |
+| `call_graph` | `edge[]` | 2 | Cross-function; see § Edges. |
+| `param_in` / `param_out` | `edge[]` | 4 | Cross-function; see § Edges. |
+
+Top-level siblings of `application` carry the manifest: `schema_version`, `language`,
+`max_level` (authoritative level marker), `k_limit` (present at L4), `analyzer{name,version}`.
+
+### `module` (per-file compilation unit)
+| Field | Type | Level | Notes |
+| --- | --- | --- | --- |
+| `id`, `kind`, `span` | — | 1 | `kind:"module"`. |
+| `package` / `namespace` | string | 1 | Language-native grouping the file belongs to. |
+| `source` | string | 1 | **The whole file's text, once.** All node text slices from this. |
+| `imports` | `import[]` | 1 | `{ name, path, alias?, span }`. |
+| `types` | `{ name → type }` | 1 | Named map. |
+| `functions` | `{ sig → callable }` | 1 | Module-level callables (Go/Python/C); empty for class-only languages. |
+| `content_hash` | string | 1 | For incremental caching; not identity. |
+
+### `type` (`class` \| `struct` \| `interface` \| `enum` \| `trait` \| `type_alias` \| …)
+| Field | Type | Level | Notes |
+| --- | --- | --- | --- |
+| `id`, `kind`, `span` | — | 1 | `kind` is the specific type kind, **not** a pile of `is_*` booleans. |
+| `base_types` | `id[]` | 1 | `extends`/embeds — durable ids of supertypes. |
+| `interfaces` | `id[]` | 1 | `implements`/satisfies. |
+| `modifiers` | `string[]` | 1 | `public`/`abstract`/`sealed`/… (language set). |
+| `decorators` | `decorator[]` | 1 | Structured `{ name, args[], span }` — **not** flat strings. |
+| `callables` | `{ sig → callable }` | 1 | Methods/constructors. |
+| `fields` | `{ name → field }` | 1 | `{ id, kind:"field", type, modifiers[], decorators[], span }`. |
+| `nesting` | `{ parent?, is_local? }` | 1 | Nested/inner/local flags as data, not booleans on the node. |
+
+### `callable` (`function` \| `method` \| `constructor` \| `initializer` \| `lambda`)
+| Field | Type | Level | Notes |
+| --- | --- | --- | --- |
+| `id`, `kind`, `span` | — | 1 | `span.bytes` → `get_method_body` = `module.source[bytes]`. |
+| `signature` | string | 1 | Human-readable; the *last* path segment of the `id`, from the one `signatureOf()`. |
+| `parameters` | `param[]` | 1 | `{ name, type, span, is_variadic? }`, ordered. |
+| `return_type` | string | 1 | |
+| `error_channel` | `string[]` | 1 | Generalized: Go `(T, error)`, Rust `Result<T,E>`, Java `throws` — one field, not `thrown_exceptions`. |
+| `modifiers`, `decorators` | — | 1 | As on `type`. |
+| `metrics` | `{ cyclomatic }` | 1 | Extensible metrics map. |
+| `refs` | `{ types:[id], fields:[id] }` | 1 | Cheap cross-refs the symbol pass already knows. |
+| `body` | `{ local-id → node }` | 3 | The statement/vertex map (below). Absent until L3. |
+| `cfg` / `cdg` / `ddg` / `summary` | `edge[]` | 3–4 | Intra-callable edges (below). |
+
+## Body node kinds (L3+, keyed by local id inside `body`)
+
+| kind | Level | Extra fields | Notes |
+| --- | --- | --- | --- |
+| `call` | **1** | `callee` (id, **`null` until L2/resolver backfill**), `arguments:[local-id]` | Call sites — recorded at L1 so `get_call_sites` is an L1 accessor; `callee` is the one refinement slot. |
+| `entry` / `exit` | 3 | — | Synthetic CFG endpoints; one each per callable; no span. |
+| `statement` | 3 | — | Ordinary statement; text = `module.source[span.bytes]`. |
+| `return` | 3 | — | Edge to `exit`. |
+| `branch` / `loop` / `switch` | 3 | — | Control constructs; sources of `cfg` branch edges + `cdg`. |
+| `formal_in` | 4 | `of` (param name) | Synthetic; child of the callable; one per formal. |
+| `formal_out` | 4 | `of` (`$ret` or a by-ref param) | Synthetic; callable exit. |
+| `actual_in` | 4 | `of` (`argN`), `parent` (call-site local-id) | Synthetic; child of a call node. |
+| `actual_out` | 4 | `of` (`$ret`), `parent` | Synthetic; child of a call node. |
+| `expression` | opt | `expr_kind` | Only with `--materialize-expressions`; carries a `parent`. |
+| `block` | opt | — | Only with `--materialize-basic-blocks`; a container between callable and statement. |
+
+## Edges
+
+Every edge is `{ src, dst, …attrs }` where `src`/`dst` are node **ids** (local within a
+callable's edge lists; full `can://` ids in the application-scope lists). The **list name is the
+type** — there is no `type` field. No dangling endpoints.
+
+| List | Scope (lives on) | Level | Endpoint kinds | Attrs |
+| --- | --- | --- | --- | --- |
+| `call_graph` | application | 2 | callable → callable | `prov:string[]`, `weight:int` (accumulated on merge) |
+| `cfg` | callable | 3 | statement → statement | `kind` (`fallthrough`\|`true`\|`false`\|`switch_case`\|`loop_back`\|`exception`\|`return`\|`break`\|`continue`\|lang-adds) |
+| `cdg` | callable | 3 | statement → statement | — (control dependence) |
+| `ddg` | callable | 3→4 | statement → statement | `var` (k-limited access path), `prov` (`["ssa"]`=syntactic/L3, `["points-to"]`=semantic/L4) |
+| `summary` | callable | 4 | actual_in → actual_out | — (transitive intra-caller) |
+| `param_in` | application | 4 | actual_in → formal_in | — |
+| `param_out` | application | 4 | formal_out → actual_out | — |
+
+## Language expansion — the rubric
+
+Keep the invariant spine (`application → module → type/callable → body`, identity-only edges,
+one `signatureOf()`), then **add** at the leaves. Record every addition in the analyzer's
+`.claude/SCHEMA_DECISIONS.md`.
+
+| Add a new… | How | Example |
 | --- | --- | --- |
-| `source` | `str` | caller `Callable.signature` |
-| `target` | `str` | callee `Callable.signature` |
-| `type` | `Literal["CALL_DEP"]` | `"CALL_DEP"` |
-| `weight` | `int` | `1` (accumulate when merging backends) |
-| `provenance` | `List[str]` | `[]` — e.g. `["tsc"]`, `["jedi","joern"]` |
-| `tags` | `Dict[str, str]` | `{}` — free-form, extension-namespaced |
-
-## Supporting leaf models
-- **Import**: `module`, `name`, `alias?`, line/column span. (Java: `path`, `is_static`,
-  `is_wildcard`.)
-- **Comment**: `content`, line/column span, `is_docstring` (Java: `is_javadoc`).
-- **CallableParameter**: `name`, `type?`, `default_value?`, line/column span. (Java adds
-  `annotations`, `modifiers`.)
-- **Decorator**: `name`, `qualified_name?`, `positional_arguments[]`, `keyword_arguments{}`,
-  span. (The Java equivalent is flat `annotations: List[str]`.)
-- **Symbol**: `name`, `scope`, `kind`, `type?`, `qualified_name?`, `is_builtin`, `lineno`,
-  `col_offset`.
-- **VariableDeclaration**: `name`, `type?`, `initializer?`, `value?`, `scope`, span.
-- **ClassAttribute**: `name`, `type?`, `comments[]`, span.
-- **Entrypoint** (optional): `signature`, `framework`, `detection_source`, route/method
-  fields, `tags{}`.
-
-## Expanding the schema for the target language (encouraged)
-
-Mirroring the shared fields is the floor, not the ceiling. A good language pack **captures
-what is idiomatic and analytically important in the target language as first-class schema** —
-it does not cram the language into the Java/Python mold and discard the rest. You are
-explicitly free to add node kinds and fields. The only thing you may not change is the spine.
-
-**The invariant spine (never drift):** the root keys `symbol_table` (a `Dict[str, Module]`)
-and `call_graph` (identity-only `List[CallEdge]`); the Module → Class/Callable nesting; one
-`signatureOf()` producing every id; and edges whose `source`/`target` byte-match real
-`Callable.signature`s. The shared SDK facade methods depend on exactly this and nothing more.
-
-**Everything else is yours to extend**, because the new language gets its **own**
-`cldk/models/<lang>/` Pydantic models. Add a field to the analyzer output *and* the
-corresponding `<L>` model in the same change, and validation still passes — you own both
-sides. You are not limited to the fields in this reference.
-
-### Decision rubric — where does a new concept go?
-1. **New top-level node kind** (sibling of Class/Callable in `Module`, or a new collection) —
-   when the concept is a *declaration* you'll want to look up by signature or point edges at
-   (TS `interface`/`type`-alias/`enum`; Go `struct`/`interface`; Rust `trait`/`impl`). Give it
-   its own `signature` from `signatureOf()` so edges and `base_classes` can reference it.
-2. **New typed field on an existing node** — when the concept is an *attribute* of a callable/
-   class/callsite that consumers will query directly and want validated (Go method
-   `receiver_type`; Rust `is_async`/`is_unsafe`; TS `type_parameters` for generics; visibility/
-   mutability). Add it to both the output and the `<L>` model with a sensible default.
-3. **Open-vocabulary `tags` / `provenance`** — when the metadata is low-stakes, sparse, or
-   framework/extension-specific and not worth a typed field (Go struct tags, build constraints;
-   TS JSX flags; experimental attributes). These are `Dict[str,str]`/`List[str]`, so they
-   round-trip without schema churn and without every consumer needing to know about them.
-
-Prefer a typed field (1 or 2) when a consumer will branch on the value; prefer `tags` (3) when
-it's descriptive metadata. When unsure, start with `tags` and promote to a field later.
-
-### Worked expansions
-- **TypeScript**: `interface`, `type`-alias, and `enum` as Class-siblings; `type_parameters`
-  for generics; union/intersection types captured in `type` strings; `extends`/`implements`
-  chains → `base_classes`; TS decorators → `decorators`; ambient/`declare` and JSX flags →
-  `tags`.
-- **Go**: `struct` and `interface` node kinds; method `receiver_type` on the callable;
-  embedded structs and satisfied interfaces → `base_classes`; goroutine launches and channel
-  ops are good `Callsite`/`tags` candidates; struct tags and build constraints → `tags`.
-- **Rust**: `trait`, `impl` block, and `enum` (with variants) node kinds; `is_async`/
-  `is_unsafe`/`is_const` and lifetime/generic params as fields; trait bounds → `base_classes`;
-  macro invocations as `Callsite`s tagged with provenance `"macro"`.
-
-Whatever you add, keep snake_case keys and make new fields optional-with-default so a partially
-populated `analysis.json` (e.g. symbol-table-only, or a degraded resolve) still validates.
-
-## The validation contract (success criterion)
-The generated analyzer's output is correct iff the SDK model loads it without error:
-
-```python
-import json
-from cldk.models.<lang> import <Lang>Application   # the models you add (subprocess backend)
-app = <Lang>Application(**json.load(open("analysis.json")))   # must not raise
-assert app.symbol_table                                       # non-empty
-sigs = { ... all Callable.signature in app.symbol_table ... }
-assert all(e.source in sigs and e.target in sigs for e in app.call_graph)  # no dangling edges
-```
-
-Because the SDK `<Lang>Application` model is itself a faithful mirror of this reference, "passes
-Pydantic validation + no dangling edges" is the comprehensive, mechanical check that the schema
-was mirrored fully and correctly. Build the SDK models first (from this reference), then make
-the analyzer's output validate against them.
+| **type kind** | new `kind` value on a `type` node | Go `struct`; TS `interface`, `enum`, `type_alias`; Rust `trait` |
+| **callable kind** | new `kind` value on a `callable` | closures, getters/setters, `init` blocks |
+| **body node kind** | new `kind` value in `body` | Go `select`, Python comprehension scope |
+| **CFG edge kind** | new `kind` value on a `cfg` edge | Go `defer_resume`, JS `await_resume`, Python `yield_resume` |
+| **typed field** | new field on a node | receiver type, `is_async`/`is_unsafe`, struct tags |
+| **open-vocab attr** | a string in `tags{}` | anything not worth a first-class field |
+
+Never: rename a shared field, repurpose a shared `kind`, add a rich-edge variant, or introduce a
+node that isn't reachable from `application` by containment. Those break the single-model
+premise the whole SDK depends on. The **parity clause** in `canonical-schema.md` is the rule;
+this table is how you satisfy it.
diff --git a/skills/codeanalyzer-backend/references/symbol-table-construction.md b/skills/codeanalyzer-backend/references/symbol-table-construction.md
index aeadfa2..bf5087b 100644
--- a/skills/codeanalyzer-backend/references/symbol-table-construction.md
+++ b/skills/codeanalyzer-backend/references/symbol-table-construction.md
@@ -1,10 +1,17 @@
-# Symbol Table Construction (file by file)
+# L1 — build the tree (symbol table, file by file)
 
-Once the schema is designed (`schema-design-loop.md`), this stage **populates** it: walk the
-project file by file and build `symbol_table: Dict[file_path, Module]`. Like the schema, you
-build this by **studying how the mature reference analyzers do it** and replicating the pattern
-for the new language — they have already solved file discovery, per-file building, caching, and
-the whole-project / target-files / single-source modes.
+The first level of the additive schema (`canonical-schema.md`): grow the **containment tree to
+callable depth** — `application → symbol_table{module} → types{}/functions{} → callables{}` —
+file by file. This is the floor everything else hangs off. You build it by **studying how the
+mature reference analyzers do it** and replicating the pattern for the new language — they have
+already solved file discovery, per-file building, caching, and the whole-project / target-files /
+single-source modes.
+
+**v2 shape (vs the old flat symbol table):** every node carries an `id` (the `can://` path), a
+`kind`, and a `span` **with byte offsets**; the **module stores the whole file's `source` once**
+(all node text slices off it — no per-callable `code`); and call sites are recorded as `call`
+nodes in each callable's `body` with `callee: null` (so `get_call_sites` works at L1; L2 backfills
+`callee`). The tree is otherwise the named-map hierarchy of `canonical-schema.md`.
 
 ## Anchor: how Java and Python construct the symbol table
 
@@ -39,12 +46,13 @@ map keyed by path**, with three entry modes (all / target-files / single-source)
    `symbol_table` key and must be identical across runs (so caching and the SDK's file lookups
    work).
 3. **Per file, cache-check then build.** If a prior `analysis_cache.json` has this file and its
-   `content_hash`/`last_modified`/`file_size` are unchanged, reuse the cached `Module`.
+   `content_hash`/`last_modified`/`file_size` are unchanged, reuse the cached `module`.
    Otherwise call your **per-file builder** (the analog of `build_pymodule_from_file` /
-   `processCompilationUnit`): parse with the structural tool, walk the tree, and fill the
-   `Module` with classes / functions / language-native kinds / callables — and on each callable
-   the **unresolved call sites** (callee name + receiver expr + arg exprs + position,
-   `callee_signature` left null). Stamp the cache metadata on the `Module`.
+   `processCompilationUnit`): parse with the structural tool, retain the file's text as the
+   module's **`source`**, walk the tree, and fill the `module` with `types` / `functions` /
+   language-native kinds / `callables` — each node with its `id`, `kind`, and `span` (with byte
+   offsets). On each callable, record the **call sites as `call` nodes in `body`** (callee name +
+   receiver expr + arg exprs + span, `callee: null`). Stamp the cache metadata on the `module`.
 4. **Assemble** `symbol_table[file_key] = module` for every file.
 5. **Support the three CLI modes** (`cli-contract.md`): whole-project (extractAll-style),
    `-t/--target-files` incremental (extract-style), and optionally single-source.
@@ -65,18 +73,20 @@ split per node kind into sibling modules under `syntactic_analysis/`. **Do not**
 threading state through arguments, with `buildClass`/`buildInterface`/`buildEnum` scattered across
 the file. See `analyzer-architecture.md` rule 2.
 
-Keep this stage to the symbol table — record call sites but **don't resolve them into edges**
-yet. That resolution is the *cheap next step* (still level 1; `backend-recipe.md` step 6), where
-the same resolver maps each site to its callee. Type fields may be populated here if your
-resolver is a same-tool checker; only the edge resolution is deferred to the next stage.
+Keep this stage to L1 — record the `call` nodes but **don't resolve them into edges** yet. That
+resolution is L2 (`backend-recipe.md`), where the same resolver maps each `call` node to its
+callee (backfilling `callee`) and emits the `call_graph`. Type fields may be populated here if
+your resolver is a same-tool checker; only the edge resolution is deferred.
 
-## Verify (the level-1 gate)
+## Verify (the L1 gate)
 Run the analyzer on a tiny fixture project and confirm:
-- the output **validates** against the SDK `<L>Application` Pydantic model
-  (`<L>Application(**json.load(...))` does not raise);
-- `symbol_table` is non-empty and keyed by stable relative paths;
-- spot-check one known file: its `Module` has the expected classes/functions, and callables
-  carry unresolved call sites with `callee_signature == null`;
+- the output **validates** against the SDK `Application` model (`Application(**json.load(...))`
+  does not raise);
+- `symbol_table` is non-empty and keyed by stable relative paths (no absolute, no `..`);
+- spot-check one known file: its `module` has the expected `types`/`functions`, a `source` blob,
+  and callables carrying `call` nodes with `callee: null`; a callable's text = `module.source`
+  sliced by `span.bytes` (`get_method_body`);
 - re-running reuses cache for unchanged files (no rebuild).
 
-Only when this passes do you move to call-graph construction.
+Only when this passes do you move to L2 (call-graph construction). Full criteria:
+`testing-and-validation.md`.
diff --git a/skills/codeanalyzer-backend/references/testing-and-validation.md b/skills/codeanalyzer-backend/references/testing-and-validation.md
index 83784ab..a0fa6c4 100644
--- a/skills/codeanalyzer-backend/references/testing-and-validation.md
+++ b/skills/codeanalyzer-backend/references/testing-and-validation.md
@@ -53,13 +53,13 @@ analyzer repo — see the frontend skill's `sdk-testing.md`.
 
 Run the analyzer on the fixture and confirm all of the following:
 
-1. **Output validates** against the SDK `<Lang>Application` Pydantic model —
-   `<Lang>Application(**json.load(open("analysis.json")))` must not raise.
+1. **Output validates** against the SDK `Application` Pydantic model —
+   `Application(**json.load(open("analysis.json")))` must not raise.
 2. **`symbol_table` is non-empty** and keyed by **stable relative paths** — no key starts
    with `/` (absolute) or `..` (CWD-relative). Both are common bugs; assert them
    explicitly.
 3. A known file's `Module` contains the expected types, functions, and call sites with
-   `callee_signature == null`. (Call sites are recorded but not resolved at this stage.)
+   `callee == null`. (Call sites are recorded but not resolved at this stage.)
 4. **Re-running reuses the cache** — mtime of `analysis.json` (or `analysis_cache.json`)
    is unchanged on a second non-eager run.
 
@@ -70,11 +70,11 @@ Do not proceed to Call Graph Construction until this passes.
 1. Every edge endpoint matches a real signature in the symbol table — no dangling nodes.
    Check: `for e in app.call_graph: assert e.source in all_sigs and e.target in all_sigs`.
 2. Every edge has a non-empty `provenance` list naming the resolver.
-3. `callee_signature` is backfilled on successfully resolved call sites (non-null, non-empty
+3. `callee` is backfilled on successfully resolved call sites (non-null, non-empty
    string).
 4. A named expected edge is present — assert the exact `(source, target)` pair.
 5. At least one cross-package/cross-module edge is present.
-6. Output still validates against `<Lang>Application`.
+6. Output still validates against `Application`.
 
 ### Caching tests (add after implementing caching/incremental — `backend-recipe.md` step 8)
 
@@ -85,7 +85,7 @@ Four behaviors to assert on the binary:
 | Test | What to assert |
 |------|----------------|
 | `CacheFileWritten` | After `Analyze()` with `CacheDir` set, `analysis_cache.json` exists in that dir. |
-| `CacheContentsRoundTrip` | `analysis_cache.json` deserializes to a valid `<Lang>Application` with the same symbol table key count as the in-memory result. |
+| `CacheContentsRoundTrip` | `analysis_cache.json` deserializes to a valid `Application` with the same symbol table key count as the in-memory result. |
 | `SecondRunReuses` | Second run with same non-eager opts returns the same symbol table key count; `analysis.json` (or cache file) mtime is unchanged. |
 | `EagerForcesRebuild` | After seeding the cache, a run with `Eager=true` rewrites `analysis_cache.json` (mtime advances). Use `time.Sleep` / `time.sleep` before the eager run to ensure the filesystem timestamp differs. |
 
@@ -95,6 +95,24 @@ Four behaviors to assert on the binary:
 clear message, never silently fall back to JSON. Assert the non-zero exit and the message.
 See `cli-contract.md § Flag validation requirements`.
 
+### Monotonicity gate (the additive-paradigm invariant)
+
+The schema is additive (`canonical-schema.md` § Monotonicity), so the level outputs must nest:
+run the analyzer at `-a 1`, `-a 2`, `-a 3`, `-a 4` on the fixture and assert
+**`json(-a 1) ⊆ json(-a 2) ⊆ json(-a 3) ⊆ json(-a 4)`** — every node and edge present at a lower
+level is present, unchanged, at every higher level. The **only** sanctioned differences are
+additions (new `body` nodes, new edge-list entries) and the single `callee: null → id`
+refinement. A diff that *changes* an existing fact (a rewritten span, a re-anchored `call_graph`
+edge, a removed syntactic `ddg` edge) fails the gate — it means a level rewrote instead of added.
+Also assert the two projections agree: the Neo4j node/edge counts at full depth match the JSON at
+`max_level` (modulo the containment `HAS_*` edges Neo4j makes explicit).
+
+### Two-tier identity gate
+
+`can://` ids (≥ callable) are stable across two runs on unchanged source; `…@line:col` ids carry a
+column (assert no id is a bare line); every edge endpoint resolves to a real node (no dangling, at
+every level and in both projections).
+
 ---
 
 ## 3. Definition of done (analyzer surface)
@@ -103,7 +121,7 @@ Both this surface and the SDK surface (frontend skill) must pass before the lang
 considered complete.
 
 - [ ] `go test ./...` (or equivalent) passes — all symbol table, call graph, and caching tests.
-- [ ] Output on the fixture validates against `<Lang>Application` without error.
+- [ ] Output on the fixture validates against `Application` without error.
 - [ ] `symbol_table` keys are relative paths; no key is absolute or `..`-prefixed.
 - [ ] Every language-specific field has at least one test asserting a concrete value.
 - [ ] Named expected call-graph edge is asserted (not just "non-empty").