Full system dependency graph at analysis level 3 (WALA 1.6.10)#172
Open
rahlk wants to merge 4 commits into
Open
Full system dependency graph at analysis level 3 (WALA 1.6.10)#172rahlk wants to merge 4 commits into
rahlk wants to merge 4 commits into
Conversation
Restores the WALA slicer SDG removed in dcdeb2c and completes it as a new analysis level. At -a 3, analysis.json gains: - system_dependency_graph: method-level CONTROL_DEP/DATA_DEP edges in the JGraphEdges shape the Python SDK already models (source/target callable, statement kinds, weight). - program_graphs: statement-level graphs keyed by (signature, node_id) with ENTRY = 0, SSA instructions in iindex order, EXIT = last: a CFG per callable (source lines; fallthrough/true/false/switch_case/loop_back/exception/return edge kinds) and a PDG (CDG/DDG edges), plus cross-function sdg_edges (CALL/PARAM_IN/PARAM_OUT). Output is deterministically sorted. New flags, strictly validated (unknown values exit non-zero, no fallback): --graphs cfg,pdg,sdg scopes the program_graphs sections; --sdg-data-deps no-heap|full picks the slicer depth (default NO_HEAP_NO_EXCEPTIONS + NO_EXCEPTIONAL_EDGES; full opts into heap-carried dependence). The RTA builder and levels 1/2 output are unchanged. The Neo4j projection follows: --emit neo4j now defaults to -a 3 (an explicit -a dials down) and method-level SDG edges project as J_CONTROL_DEP / J_DATA_DEP / J_HEAP_DATA_DEP relationships between :JCallable nodes with the same resolved-gating as J_CALLS. The dependence kind rides in the relationship type because the writers MERGE one relationship per (type, source, target). Neo4j schema contract bumps additively to 1.1.0. Issue #171
README gains the full-SDG section (-a 3, --graphs, --sdg-data-deps, known unsoundness) and the Neo4j default; CLAUDE.md (with AGENTS.md symlink) is the contributor/agent guide; .claude/SCHEMA_DECISIONS.md records the level-3 and Neo4j-projection design decisions. Repo .gitignore un-ignores these three past a global gitignore rule. Issue #171
…aph provider Taint and slicing are frontend-side reachability queries over the emitted universal graph (program_graphs + system_dependency_graph); the analyzer never runs client analyses. Graph substrate additions (per-argument PARAM nodes, SUMMARY edges) remain analyzer-side. Issue #171
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #171.
What
-a 2output on thecall-graph-testfixture is identical before/after (call graph and symbol table compared JSON-normalized againstmain).-a 3): restores and completes the WALA slicer SDG that was removed indcdeb2c, exposing the full system dependency graph — control + data dependence — as two newanalysis.jsonsections:system_dependency_graph— method-level dependence edges (CONTROL_DEP/DATA_DEP+ statement kinds + weight). Validates against the Python SDK's existingJApplication.system_dependency_graph: List[JGraphEdges]model with zero SDK changes (verified withJApplication.model_validateon the fixture output).program_graphs— statement-level graphs per the CLDK level-3 dataflow contract, keyed by(signature, node_id)(ENTRY= 0, SSA instructions in iindex order,EXIT= last): per-callable CFG (nodes with source lines;fallthrough/true/false/switch_case/loop_back/exception/returnedges) and PDG (CDG/DDGedges), plus cross-functionsdg_edges(CALL/PARAM_IN/PARAM_OUT). Output is deterministically sorted.--graphs cfg,pdg,sdg— scope the emittedprogram_graphssections (default all; requires-a 3).--sdg-data-deps no-heap|full— slicer data-dependence depth. Defaultno-heap(NO_HEAP_NO_EXCEPTIONS+NO_EXCEPTIONAL_EDGES, the fast pre-removal settings);fullopts into heap-carried dependence.--emit neo4jdefaults to the full SDG analysis (-a 3) — an explicit-adials down — and method-level SDG edges project asJ_CONTROL_DEP/J_DATA_DEP/J_HEAP_DATA_DEPrelationships between:JCallablenodes (propsweight/source_kind/destination_kind, same resolved-gating asJ_CALLS). The dependence kind rides in the relationship type because the writersMERGEone relationship per(type, source, target)— a pair with both a control and a data dependence must keep both edges. WALA'sDependencyenum is closed (exactly those three kinds), so the vocabulary is total.schema.neo4j.jsonbumps additively to 1.1.0..claude/SCHEMA_DECISIONS.md; agent guide (CLAUDE.md+AGENTS.mdsymlink) and README documentation added.What it looks like
On the
call-graph-testfixture (helloString()→log()→loglog(),helloString()→getName()):PARAM_OUTcorrectly appears only for the non-void callee; every call gets an exceptional CFG edge; the PDG shows the def-use chaingetName() result → concat → return → EXIT.Verification
-a 2parity gate: byte-equivalentcall_graph/symbol_tablevsmainon the fixture (WALA bump + refactor are invisible below level 3).-a 3output validates againstcldk.models.java.JApplication.callGraphShouldHaveKnownEdges):fullSystemDependencyGraphShouldBeEmittedAtAnalysisLevelThree— concrete edge assertions (control deplog()→loglog(), data dephelloString()→getName(),CALL→ callee#0,PARAM_OUTfromgetName()), single-ENTRY/EXIT CFG gate, ENTRY-anchored CDG + DDG presence, and a no-dangling gate over everysdg_edgesendpoint.analysisLevelTwoShouldNotEmitSdgSections— level 2 stays call-graph-only.invalidGraphSelectorShouldFailFast— unknown--graphsvalue exits non-zero with a clear error.GraphProjectorSystemDepTest(both dependence kinds survive between the same pair; unknown kinds skipped; unresolved endpoints gated out) andNeo4jSchemaConformanceTest(projector emits only cataloged types; checked-inschema.neo4j.jsonregenerated and current). End-to-end:--emit neo4jwith no-aon the fixture produces agraph.cypherwith 3J_CALLS+ 3J_CONTROL_DEP+ 4J_DATA_DEProws.Out of scope (tracked in #171)
SUMMARYedges, slicing/taint clients, the Neo4j CPG projection of the level-3 graphs, and--jobsparallelism. SDK-side adoption (default-a 3, sharedProgramGraphsmodels, SCIP adaptation) is codellm-devkit/python-sdk#228.