Skip to content

Full system dependency graph: update WALA to 1.6.10 and expose control+data dependence at analysis level 3 #171

Description

@rahlk

Problem

codeanalyzer-java today emits the level-1 symbol table (-a 1) and the WALA RTA call graph (-a 2). Despite its name, SystemDependencyGraph.java builds only the call graph: commit dcdeb2c (Mar 2025) removed the WALA slicer SDG construction and renamed the output key from system_dependency_graph to call_graph. The Python SDK still models JApplication.system_dependency_graph but it is always null, and JavaAnalysis.get_system_dependency_graph() warns "System dependency graph is not yet implemented. Returning the call graph instead."

This issue restores and completes the SDG: update WALA and expose the full system dependency graph — control + data dependence — as a first-class analysis level.

Goals

  1. Update WALA from 1.6.7 to 1.6.10 (latest release on Maven Central).
  2. Add analysis level 3 (-a 3): everything -a 2 emits, plus:
    • system_dependency_graph — method-level dependence edges in the existing JGraphEdges shape (source/target callable, type = the WALA dependence label, source_kind/destination_kind = statement kinds, weight), so the SDK model that already exists validates without changes;
    • program_graphs — statement-level graphs per the CLDK level-3 dataflow contract: per-callable cfg (nodes + edges) and pdg (CDG/DDG edges) keyed by (signature, node_id), plus cross-function sdg_edges (CALL, PARAM_IN, PARAM_OUT), schema_version'd.
  3. --graphs cfg,pdg,sdg selector (default: all at -a 3), with strict flag validation (unknown values = non-zero exit, no silent fallback).
  4. --sdg-data-deps <no-heap|full> knob: default no-heap (DataDependenceOptions.NO_HEAP_NO_EXCEPTIONS + ControlDependenceOptions.NO_EXCEPTIONAL_EDGES, the fast pre-removal settings); full opts into heap-carried data dependence (DataDependenceOptions.FULL + ControlDependenceOptions.FULL).
  5. -a 1 / -a 2 output and timings unchanged. Nothing at level 3 runs unless requested.

Substrate decisions (locked)

  • Engine: WALA is the native, in-process substrate — IR/SSACFG for the CFG, com.ibm.wala.ipa.slicer.SDG for dependence edges. No external engines.
  • Call-graph builder feeding the SDG: RTA (Util.makeRTABuilder), unchanged — fast and proven on the fixtures; adequate for no-heap data dependence. Precision upgrades (0-1-CFA when --sdg-data-deps=full) are a possible follow-up.
  • Node identity: (signature, node_id) with node_id derived from SSA instruction order (ENTRY = 0, EXIT = last), source lines mapped via the ECJ/CAst source positions where available. This is the Java adaptation of the contract's "AST node in source-span order" — recorded in .claude/SCHEMA_DECISIONS.md.
  • Precision posture: sound-leaning, over-approximate; reflection remains ReflectionOptions.NONE (documented unsoundness, unchanged from level 2).
  • Scope pruning: application classes only (GraphSlicer.prune on the application class loader), matching the call graph.

Out of scope (follow-ups, not this issue)

  • SUMMARY edges (HRB transitive summaries) and context-sensitive slicing/taint clients.
  • CPG projection of the level-3 graphs into the Neo4j emitter (CFGNode, CFG_NEXT, CDG, DDG, … labels + schema.neo4j.json bump).
  • Deterministic --jobs parallel fan-out.
  • SDK-side work — tracked separately on python-sdk: default the Java backend to -a 3 (dial down on request), shared ProgramGraphs Pydantic models, and adapting SCIP indexing to the new schema.

Verification / definition of done

  • WALA 1.6.10 builds clean; existing test suite green.
  • On the call-graph-test fixture: at least one known CONTROL_DEP-labeled and one data-dependence edge asserted concretely (exact endpoints), not "non-empty".
  • No dangling endpoints: every system_dependency_graph endpoint resolves to a symbol-table callable; every sdg_edges endpoint (signature, node_id) exists in that function's emitted graph.
  • CFG gate per callable: single ENTRY/EXIT; every node carries a source span or an explicit -1 sentinel.
  • -a 2 output on the fixture is byte-identical before/after (call graph untouched).
  • Unknown --graphs/--sdg-data-deps values exit non-zero with a clear message.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions