Problem
codeanalyzer-java today emits the level-1 symbol table (-a 1) and the WALA RTA call graph (-a 2). Despite its name, SystemDependencyGraph.java builds only the call graph: commit dcdeb2c (Mar 2025) removed the WALA slicer SDG construction and renamed the output key from system_dependency_graph to call_graph. The Python SDK still models JApplication.system_dependency_graph but it is always null, and JavaAnalysis.get_system_dependency_graph() warns "System dependency graph is not yet implemented. Returning the call graph instead."
This issue restores and completes the SDG: update WALA and expose the full system dependency graph — control + data dependence — as a first-class analysis level.
Goals
- Update WALA from 1.6.7 to 1.6.10 (latest release on Maven Central).
- Add analysis level 3 (
-a 3): everything -a 2 emits, plus:
system_dependency_graph — method-level dependence edges in the existing JGraphEdges shape (source/target callable, type = the WALA dependence label, source_kind/destination_kind = statement kinds, weight), so the SDK model that already exists validates without changes;
program_graphs — statement-level graphs per the CLDK level-3 dataflow contract: per-callable cfg (nodes + edges) and pdg (CDG/DDG edges) keyed by (signature, node_id), plus cross-function sdg_edges (CALL, PARAM_IN, PARAM_OUT), schema_version'd.
--graphs cfg,pdg,sdg selector (default: all at -a 3), with strict flag validation (unknown values = non-zero exit, no silent fallback).
--sdg-data-deps <no-heap|full> knob: default no-heap (DataDependenceOptions.NO_HEAP_NO_EXCEPTIONS + ControlDependenceOptions.NO_EXCEPTIONAL_EDGES, the fast pre-removal settings); full opts into heap-carried data dependence (DataDependenceOptions.FULL + ControlDependenceOptions.FULL).
-a 1 / -a 2 output and timings unchanged. Nothing at level 3 runs unless requested.
Substrate decisions (locked)
- Engine: WALA is the native, in-process substrate — IR/
SSACFG for the CFG, com.ibm.wala.ipa.slicer.SDG for dependence edges. No external engines.
- Call-graph builder feeding the SDG: RTA (
Util.makeRTABuilder), unchanged — fast and proven on the fixtures; adequate for no-heap data dependence. Precision upgrades (0-1-CFA when --sdg-data-deps=full) are a possible follow-up.
- Node identity:
(signature, node_id) with node_id derived from SSA instruction order (ENTRY = 0, EXIT = last), source lines mapped via the ECJ/CAst source positions where available. This is the Java adaptation of the contract's "AST node in source-span order" — recorded in .claude/SCHEMA_DECISIONS.md.
- Precision posture: sound-leaning, over-approximate; reflection remains
ReflectionOptions.NONE (documented unsoundness, unchanged from level 2).
- Scope pruning: application classes only (
GraphSlicer.prune on the application class loader), matching the call graph.
Out of scope (follow-ups, not this issue)
SUMMARY edges (HRB transitive summaries) and context-sensitive slicing/taint clients.
- CPG projection of the level-3 graphs into the Neo4j emitter (
CFGNode, CFG_NEXT, CDG, DDG, … labels + schema.neo4j.json bump).
- Deterministic
--jobs parallel fan-out.
- SDK-side work — tracked separately on
python-sdk: default the Java backend to -a 3 (dial down on request), shared ProgramGraphs Pydantic models, and adapting SCIP indexing to the new schema.
Verification / definition of done
- WALA 1.6.10 builds clean; existing test suite green.
- On the
call-graph-test fixture: at least one known CONTROL_DEP-labeled and one data-dependence edge asserted concretely (exact endpoints), not "non-empty".
- No dangling endpoints: every
system_dependency_graph endpoint resolves to a symbol-table callable; every sdg_edges endpoint (signature, node_id) exists in that function's emitted graph.
- CFG gate per callable: single
ENTRY/EXIT; every node carries a source span or an explicit -1 sentinel.
-a 2 output on the fixture is byte-identical before/after (call graph untouched).
- Unknown
--graphs/--sdg-data-deps values exit non-zero with a clear message.
Problem
codeanalyzer-javatoday emits the level-1 symbol table (-a 1) and the WALA RTA call graph (-a 2). Despite its name,SystemDependencyGraph.javabuilds only the call graph: commitdcdeb2c(Mar 2025) removed the WALA slicer SDG construction and renamed the output key fromsystem_dependency_graphtocall_graph. The Python SDK still modelsJApplication.system_dependency_graphbut it is alwaysnull, andJavaAnalysis.get_system_dependency_graph()warns "System dependency graph is not yet implemented. Returning the call graph instead."This issue restores and completes the SDG: update WALA and expose the full system dependency graph — control + data dependence — as a first-class analysis level.
Goals
-a 3): everything-a 2emits, plus:system_dependency_graph— method-level dependence edges in the existingJGraphEdgesshape (source/targetcallable,type= the WALA dependence label,source_kind/destination_kind= statement kinds,weight), so the SDK model that already exists validates without changes;program_graphs— statement-level graphs per the CLDK level-3 dataflow contract: per-callablecfg(nodes + edges) andpdg(CDG/DDG edges) keyed by(signature, node_id), plus cross-functionsdg_edges(CALL,PARAM_IN,PARAM_OUT),schema_version'd.--graphs cfg,pdg,sdgselector (default: all at-a 3), with strict flag validation (unknown values = non-zero exit, no silent fallback).--sdg-data-deps <no-heap|full>knob: defaultno-heap(DataDependenceOptions.NO_HEAP_NO_EXCEPTIONS+ControlDependenceOptions.NO_EXCEPTIONAL_EDGES, the fast pre-removal settings);fullopts into heap-carried data dependence (DataDependenceOptions.FULL+ControlDependenceOptions.FULL).-a 1/-a 2output and timings unchanged. Nothing at level 3 runs unless requested.Substrate decisions (locked)
SSACFGfor the CFG,com.ibm.wala.ipa.slicer.SDGfor dependence edges. No external engines.Util.makeRTABuilder), unchanged — fast and proven on the fixtures; adequate for no-heap data dependence. Precision upgrades (0-1-CFA when--sdg-data-deps=full) are a possible follow-up.(signature, node_id)withnode_idderived from SSA instruction order (ENTRY= 0,EXIT= last), source lines mapped via the ECJ/CAst source positions where available. This is the Java adaptation of the contract's "AST node in source-span order" — recorded in.claude/SCHEMA_DECISIONS.md.ReflectionOptions.NONE(documented unsoundness, unchanged from level 2).GraphSlicer.pruneon the application class loader), matching the call graph.Out of scope (follow-ups, not this issue)
SUMMARYedges (HRB transitive summaries) and context-sensitive slicing/taint clients.CFGNode,CFG_NEXT,CDG,DDG, … labels +schema.neo4j.jsonbump).--jobsparallel fan-out.python-sdk: default the Java backend to-a 3(dial down on request), sharedProgramGraphsPydantic models, and adapting SCIP indexing to the new schema.Verification / definition of done
call-graph-testfixture: at least one knownCONTROL_DEP-labeled and one data-dependence edge asserted concretely (exact endpoints), not "non-empty".system_dependency_graphendpoint resolves to a symbol-table callable; everysdg_edgesendpoint(signature, node_id)exists in that function's emitted graph.ENTRY/EXIT; every node carries a source span or an explicit-1sentinel.-a 2output on the fixture is byte-identical before/after (call graph untouched).--graphs/--sdg-data-depsvalues exit non-zero with a clear message.