Skip to content

Add CSEPass to the default edge compilation pipeline (#19849)#19849

Open
billmguo wants to merge 1 commit into
pytorch:mainfrom
billmguo:export-D106584552
Open

Add CSEPass to the default edge compilation pipeline (#19849)#19849
billmguo wants to merge 1 commit into
pytorch:mainfrom
billmguo:export-D106584552

Conversation

@billmguo
Copy link
Copy Markdown
Contributor

@billmguo billmguo commented May 28, 2026

Summary:

CSEPass (Common Subexpression Elimination) was previously only used by specific backends (MLX, Cadence) and export scripts. This change adds it to base_post_op_replace_passes in the default to_edge / to_edge_transform_and_lower pipeline so all backends benefit from CSE automatically.

The pass is placed before dead_code_elimination_pass so any dead nodes CSE creates get cleaned up, and before DebugHandleGeneratorPass so debug handles are assigned to the final deduplicated graph.

CSE can merge structurally-identical nodes across different source partitions, which caused ConvertToLinearPass in XNNPACK to see extra output nodes in a partition. This is fixed by filtering outputs to only those reachable from the mm/addmm node within the partition.

The test_dim_order_revert_pass test is updated to expect 2 _to_copy ops (instead of 4) after to_edge, since CSE deduplicates the two identical branches.

Reviewed By: mohankumarkumar

Differential Revision: D106584552

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 28, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19849

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures, 1 Unrelated Failure

As of commit f1614d2 with merge base 4469d84 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented May 28, 2026

CLA Not Signed

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 28, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented May 28, 2026

@billmguo has exported this pull request. If you are a Meta employee, you can view the originating Diff in D106584552.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@billmguo billmguo force-pushed the export-D106584552 branch from d32fd3c to 721be34 Compare May 28, 2026 17:04
@billmguo billmguo requested a review from digantdesai as a code owner May 28, 2026 17:04
@billmguo billmguo force-pushed the export-D106584552 branch from 721be34 to be45ec5 Compare May 28, 2026 18:42
@github-actions github-actions Bot added ciflow/trunk module: arm Issues related to arm backend labels May 28, 2026
@meta-codesync meta-codesync Bot changed the title Add CSEPass to the default edge compilation pipeline Add CSEPass to the default edge compilation pipeline (#19849) May 29, 2026
@billmguo billmguo force-pushed the export-D106584552 branch from be45ec5 to cc4c35e Compare May 29, 2026 18:40
billmguo added a commit to billmguo/executorch that referenced this pull request May 29, 2026
Summary:

`CSEPass` (Common Subexpression Elimination) was previously only used by specific backends (MLX, Cadence) and export scripts. This change adds it to `base_post_op_replace_passes` in the default `to_edge` / `to_edge_transform_and_lower` pipeline so all backends benefit from CSE automatically.

The pass is placed before `dead_code_elimination_pass` so any dead nodes CSE creates get cleaned up, and before `DebugHandleGeneratorPass` so debug handles are assigned to the final deduplicated graph.

CSE can merge structurally-identical nodes across different source partitions, which caused `ConvertToLinearPass` in XNNPACK to see extra output nodes in a partition. This is fixed by filtering outputs to only those reachable from the mm/addmm node within the partition.

The `test_dim_order_revert_pass` test is updated to expect 2 `_to_copy` ops (instead of 4) after `to_edge`, since CSE deduplicates the two identical branches.

Reviewed By: mohankumarkumar

Differential Revision: D106584552
@billmguo billmguo force-pushed the export-D106584552 branch from cc4c35e to 3dfd283 Compare June 1, 2026 16:44
billmguo added a commit to billmguo/executorch that referenced this pull request Jun 1, 2026
Summary:

`CSEPass` (Common Subexpression Elimination) was previously only used by specific backends (MLX, Cadence) and export scripts. This change adds it to `base_post_op_replace_passes` in the default `to_edge` / `to_edge_transform_and_lower` pipeline so all backends benefit from CSE automatically.

The pass is placed before `dead_code_elimination_pass` so any dead nodes CSE creates get cleaned up, and before `DebugHandleGeneratorPass` so debug handles are assigned to the final deduplicated graph.

CSE can merge structurally-identical nodes across different source partitions, which caused `ConvertToLinearPass` in XNNPACK to see extra output nodes in a partition. This is fixed by filtering outputs to only those reachable from the mm/addmm node within the partition.

The `test_dim_order_revert_pass` test is updated to expect 2 `_to_copy` ops (instead of 4) after `to_edge`, since CSE deduplicates the two identical branches.

Reviewed By: mohankumarkumar

Differential Revision: D106584552
Summary:

`CSEPass` (Common Subexpression Elimination) was previously only used by specific backends (MLX, Cadence) and export scripts. This change adds it to `base_post_op_replace_passes` in the default `to_edge` / `to_edge_transform_and_lower` pipeline so all backends benefit from CSE automatically.

The pass is placed before `dead_code_elimination_pass` so any dead nodes CSE creates get cleaned up, and before `DebugHandleGeneratorPass` so debug handles are assigned to the final deduplicated graph.

CSE can merge structurally-identical nodes across different source partitions, which caused `ConvertToLinearPass` in XNNPACK to see extra output nodes in a partition. This is fixed by filtering outputs to only those reachable from the mm/addmm node within the partition.

The `test_dim_order_revert_pass` test is updated to expect 2 `_to_copy` ops (instead of 4) after `to_edge`, since CSE deduplicates the two identical branches.

Reviewed By: mohankumarkumar

Differential Revision: D106584552
@billmguo billmguo force-pushed the export-D106584552 branch from 3dfd283 to f1614d2 Compare June 1, 2026 16:45
@digantdesai
Copy link
Copy Markdown
Contributor

@metascroy can you take a look?

@claude review this PR

@billmguo Can you please sign the CLA?

@claude
Copy link
Copy Markdown

claude Bot commented Jun 1, 2026

Claude finished @digantdesai's task in 1m 37s —— View job


Review of PR #19849: Add CSEPass to the default edge compilation pipeline

  • Gather context and read changed files
  • Review exir/passes/__init__.py changes
  • Review backends/xnnpack/_passes/convert_to_linear.py changes
  • Review exir/passes/BUCK changes
  • Review test changes
  • Post review feedback

Overall: This is a well-structured change. Adding CSE to the default pipeline is a good idea — deduplicating identical computations benefits all backends without requiring them to opt in individually. The placement before DCE and DebugHandleGeneratorPass is correct. The XNNPACK fix is well-motivated. A few observations below:


backends/xnnpack/_passes/convert_to_linear.py — Reachability fix

The forward-reachability walk from node.users (lines 132–138) correctly filters output nodes to only those downstream of the mm/addmm node within the partition. However:

  1. BFS vs DFS semantics: Using queue.pop() gives DFS behavior (stack). This is fine for correctness since you just need reachability, but naming the variable queue is slightly misleading. Minor nit — no functional issue.

  2. Potential infinite loop safety: The if cur in reachable guard correctly prevents revisiting nodes. Good.

  3. Edge case — zero reachable outputs: If CSE merges nodes such that no output in src_partition.output_nodes is reachable from the mm/addmm, the subsequent assert will fire with an empty list. This seems unlikely given how source partitions work, but worth noting that the assert message would show [] which is still informative. Fine as-is.

backends/arm/test/passes/test_fuse_duplicate_users_pass.py

The ops_before_pass values now equal ops_after_pass for both FuseaAvgPool (1 → 1) and FuseAvgPoolChain (2 → 2). This means the FuseDuplicateUsersPass is effectively a no-op in these tests now that CSE runs first during to_edge. The tests still validate that the pass doesn't break anything when there's nothing to fuse, but they no longer test the pass's fusing behavior.

Suggestion: Consider adding a test case where inputs differ (so CSE can't deduplicate) but outputs are consumed by the same user — that would still exercise the fuse pass's core logic independently of CSE.

exir/passes/__init__.py

Placement looks correct:

CSEPass()           # deduplicate identical subexpressions
dead_code_elimination_pass  # clean up dead nodes CSE may leave behind
DebugHandleGeneratorPass()  # assign handles to final graph

CSE already calls graph.eliminate_dead_code() internally when it modifies the graph (line 122 of cse_pass.py), so the subsequent dead_code_elimination_pass is redundant when CSE fires — but harmless and ensures cleanup if other passes between pre/post introduce dead code. Fine.

exir/tests/test_passes.py

The test update for test_dim_order_revert_pass is well-commented. The observation that CSE doesn't deduplicate dim_order ops (because they're non-aten and treated as unsafe by _is_safe_target) is correct and important — it means the _skip_dim_order=False path remains unaffected by CSE.

CSEPass implementation (exir/passes/cse_pass.py)

The implementation is solid:

  • Value numbering approach is clean and O(n) per node
  • Schema introspection for safety is principled — only trusting aten:: schemas
  • The _output_nodes protection prevents deduplicating buffer mutation outputs (important for KV cache patterns)
  • The TypeError catch in _make_hashable is a good fallback for exotic arg types

One minor observation: _is_safe caches by id(target) (line 129), which is correct since op targets are interned singletons in PyTorch. Just calling it out as an assumption that holds today.

Summary

Approve with minor suggestion: The change is correct and well-thought-out. The only actionable suggestion is to consider whether the ARM FuseDuplicateUsersPass tests still provide meaningful coverage now that CSE pre-empts their fusing — but that's arguably a follow-up concern, not a blocker.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported module: arm Issues related to arm backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants