Reduce CI cost by metascroy · Pull Request #19919 · pytorch/executorch

metascroy · 2026-06-01T17:02:25Z

Currently every push to main runs all macOS jobs unconditionally. This PR path-filters macOS jobs on push as well as PR, but samples 25% of push commits (deterministic by SHA hash) to still run the full suite for HUD/bisection signal. A new viable-strict-gate workflow fails on the 75% non-sampled commits and is added to viable/strict's requires list, so viable/strict only advances on commits where every job ran. Estimated ~75% macOS runner savings.

CI behavior changes:

macOS jobs in pull.yml / trunk.yml now skip on pushes that don't touch their paths and aren't in the sample
A new viable-strict-gate workflow runs on every push to main/release/* and fails when the commit isn't a sampled full-run.
update-viablestrict now requires the gate workflow → viable/strict advances ~every 4 commits instead of every commit.
Maintainers can force a full run on any main/release commit by running the new "Promote commit to viable/strict" workflow from the Actions tab
Sampling rule lives in one place: _ci-run-decision.yml; change the rate or rule there.

pytorch-bot · 2026-06-01T17:02:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19919

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-01T17:03:43Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

metascroy · 2026-06-01T17:04:27Z

@claude review this code

claude · 2026-06-01T17:05:21Z

Claude finished @metascroy's task in 2m 14s —— View job

Code Review: Reduce CI Cost

Read changed workflow files
Analyze CI run decision logic
Analyze viable-strict-gate workflow
Analyze promote-to-viable-strict workflow
Review pull.yml and trunk.yml changes
Review update-viablestrict.yml changes
Post review findings

Overall Assessment

The design is sound — a deterministic 25% SHA-based sampling that's centralized in _ci-run-decision.yml, gated via viable-strict-gate.yml, with a clean manual escape hatch. The architecture is well-thought-out. A few issues to address:

Issues

1. test-llama-torchao-lowbit in trunk.yml is ungated (Medium)

trunk.yml:398-415 — This macOS job uses macos_job.yml but has no needs: run-decision or if: gate. It runs on every push to main unconditionally, bypassing the cost-saving mechanism. This seems like an oversight since all other macOS jobs in trunk.yml are gated.

Fix this →

2. Missing permissions block — CodeQL findings (Low)

As flagged by CodeQL, the following workflows lack explicit permissions:

_ci-run-decision.yml (line 69) — the decide job only needs read access
viable-strict-gate.yml (lines 32, 49) — both jobs only need read

Adding permissions: {} at the workflow level (or per-job) is a security best practice:

permissions: {}  # No token permissions needed

3. Misleading comment in pull.yml (Nit)

pull.yml:24 says "Emits is-full-run='true' for PR/dispatch/tag/sampled-push commits" but _ci-run-decision.yml actually returns 'false' for pull_request events — the PR comment incorrectly suggests PRs get is-full-run = 'true'. The mechanism works correctly (PR jobs use github.event_name == 'pull_request' || to run unconditionally on PRs), but the comment is misleading.

4. pull.yml test-coreml-bc-macos — behavior change for PRs (Verify)

pull.yml:1529 — The old condition was:

github.event_name != 'pull_request' ||
contains(...) || ...

Meaning: always run on push, path-filtered on PRs.

The new condition is:

contains(...) || ... ||
needs.run-decision.outputs.is-full-run == 'true'

Meaning: path-filtered on both push and PR. For PRs, is-full-run is 'false', so this job now only runs on PRs that touch relevant paths. Previously it ran on every non-push event unconditionally. This is a functional change to PR behavior — verify that's intentional.

Design Observations (Non-blocking)

Sampling approach is solid. Using the last hex character of SHA for deterministic 25% sampling ({0,4,8,c}) is clean and reproducible. The case statement is POSIX-compatible and fast.

Single source of truth. Having _ci-run-decision.yml as the sole logic owner and reusing it from both trunk.yml/pull.yml (for gating) and viable-strict-gate.yml (for advancement blocking) avoids drift.

Promote workflow is well-secured. SHA validation (40-char hex, must exist, must be reachable from main/release) and the idempotent tag-exists check are good defensive programming.

Concurrency control on the promote workflow (cancel-in-progress: false) is the right choice to avoid racing tag pushes.

Summary

The PR achieves its goal of ~75% macOS runner cost reduction with a clean, maintainable design. The main actionable issue is the ungated test-llama-torchao-lowbit job. The permissions and comment issues are low-priority polish.

digantdesai · 2026-06-01T19:49:38Z

+          # Keep in sync with the sample in viable-strict-gate.yml.
+          if [ "$IS_FULL" = "false" ] && [ "$EVENT_NAME" = "push" ]; then
+            case "$SHA" in
+              *0|*4|*8|*c) IS_FULL=true ;;


you can hit 100 consecutive sha ending it 0,4,8,c in a row :p

very unlikely, but I take the point :)

I'll change sampling to be based on git history depth so it regularly runs every 4 commits

digantdesai · 2026-06-01T19:51:10Z


  test-coreml-bc-macos:
-    needs: changed-files
+    needs: [changed-files, run-decision]


also do it for windows?

I'll focus on macOS + core infra (viable strict update changes and main PR selection job) in this PR. windows and cuda enablement can be follow-ups.

digantdesai · 2026-06-01T19:54:58Z

+      contains(needs.changed-files.outputs.changed-files, '.github/workflows/trunk.yml') ||
+      needs.run-decision.outputs.is-full-run == 'true'


why not add the sha % 4 logic here and not kick off a new job? curious

B/c then we can use it in other jobs (like mlx.yml or cuda.yml) and keep the selection logic in one place

digantdesai

LGTM. stamping to unblock.

metascroy · 2026-06-01T21:12:38Z

@claude review this code

claude · 2026-06-01T21:13:27Z

Claude finished @metascroy's task in 2m 10s —— View job

Code Review: Reduce CI Cost

Read changed workflow files
Analyze CI run decision logic
Analyze viable-strict-gate workflow
Analyze promote-to-viable-strict workflow
Review pull.yml and trunk.yml changes
Review update-viablestrict.yml changes
Post review findings

Overall Assessment

Well-designed approach to reducing macOS CI costs. The depth-based sampling (addressing digantdesai's review feedback) is an improvement over the original SHA-based approach since it guarantees hard bounds on gaps between full runs. The architecture is clean with _ci-run-decision.yml as a single source of truth.

Issues

1. git rev-list --count includes all merge parents (Medium)

_ci-run-decision.yml:74 — git rev-list --count "$SHA" counts ALL ancestors reachable from $SHA, not just the first-parent (linear) history. In a repo that merges PRs via merge commits, the count jumps by (1 + PR_commit_count) per merge, not by 1. This means:

The "every 4th commit" guarantee doesn't map to "every 4th merge to main" — some merges will have counts that jump by 5+ (skipping a mod-4 boundary), while multi-commit PR merges could hit multiple mod-4 boundaries
The sampling rate won't be exactly 25% of main-branch pushes; it could be higher or lower depending on PR sizes

If you want deterministic "every 4th push to main," use:

DEPTH=$(git rev-list --first-parent --count "$SHA")

This counts only the linear history along the default branch, giving exactly one count per push event.

2. Stale comment in pull.yml (Low)

pull.yml:18-19 — The comment says "non-PR events emit '*' so the per-job if: short-circuits via event_name != 'pull_request'" — but this pattern no longer exists for the gated jobs. The path-filtered macOS jobs now use contains(...) || is-full-run == 'true' instead of the old event_name != 'pull_request' || contains(...) pattern. The comment should be updated to reflect that non-PR events rely on is-full-run for the sampled gate.

3. viable-strict-gate promotion race: push-triggered failure persists (Verify)

When a commit is pushed to main, viable-strict-gate triggers and (75% of the time) fails. If you later promote that commit via promote-to-viable-strict.yml (pushing a ciflow/trunk/<sha> tag), a new run of viable-strict-gate triggers from the tag push and succeeds. But the original push-triggered run still shows as failed.

Whether update-viablestrict can advance depends on how pytorch/test-infra/.github/actions/update-viablestrict resolves conflicting conclusions for the same SHA. If it checks "latest run" or "any passing run" for the viable-strict-gate workflow, promotion works. If it requires ALL runs of that workflow for the SHA to pass, promoted commits would still be blocked. Worth verifying this is handled correctly by the upstream action.

Design Observations (Non-blocking)

Depth-based sampling is a good response to the review feedback. Using depth % 4 == 0 with --count gives deterministic results and (with --first-parent) would guarantee exactly 1-in-4 main-branch commits get full CI.
permissions: {} on viable-strict-gate.yml — good, addresses the CodeQL findings from the automated review.
Promote workflow validation is thorough: 40-char hex check, existence check, reachability check, idempotent tag-exists handling.
test-llama-torchao-lowbit in trunk.yml — the previous review flagged this as ungated, but looking at the current code (lines 391-395), it now correctly has needs: run-decision and the standard gate. Resolved.

Summary

The main actionable issue is using --first-parent in git rev-list --count to get a predictable linear count that maps 1:1 to pushes on main. Without it, the depth count is influenced by the size of merged PRs, making the sampling rate unpredictable. The other items are informational/low-priority.

metascroy · 2026-06-01T21:27:40Z

@claude review this code

claude · 2026-06-01T21:28:29Z

Claude finished @metascroy's task in 3m 7s —— View job

Code Review: Reduce CI Cost

Read changed workflow files
Analyze CI run decision logic
Analyze viable-strict-gate workflow
Analyze promote-to-viable-strict workflow
Review pull.yml and trunk.yml changes
Review mlx.yml changes
Review update-viablestrict.yml changes
Post review findings

Overall Assessment

The design is solid — depth-based deterministic 25% sampling centralized in _ci-run-decision.yml, with --first-parent for predictable linear counting (addressing digantdesai's earlier feedback). The viable-strict gate, promotion escape hatch, and per-job gating are all well-structured. Previous review issues (ungated test-llama-torchao-lowbit, missing permissions: {}) have been addressed.

Issues

1. Misleading comment about changed-files behavior (Medium)

trunk.yml:22-25 and pull.yml:18-21 both say:

Emits PR diff file list; non-PR events emit '*' (every contains() check returns true)

This is incorrect. GitHub Actions contains('*', 'backends/apple/coreml') returns false — * is a literal single-character string, not a glob/wildcard in this context. The contains() function does substring matching, so * does not contain backends/apple/coreml.

The old code worked because it had event_name != 'pull_request' || to short-circuit on push. The new code works because it has needs.run-decision.outputs.is-full-run == 'true' as the final OR clause. So the mechanism is correct, but the comment incorrectly explains why.

This matters because a future reader trusting the comment might remove the is-full-run clause thinking contains() already handles push events.

Suggested fix:

# Emits PR diff file list; non-PR events emit '*'. Note: contains('*', 'path')
# is false (literal substring match, not glob), so path-filtered jobs on push
# rely on run-decision's is-full-run output to gate execution.

Fix this →

2. test-llama-runner-macos path filter is redundant on push (Low)

trunk.yml:480-493 — On push events, changed-files outputs '*'. Since contains('*', 'backends/apple/coreml') is false (see issue #1 above), ALL the contains() clauses evaluate to false on push. The job then only runs when is-full-run == 'true'.

This means these path-filtered jobs never run on non-sampled push commits, even if the push touches their relevant paths. This is different from the PR behavior where path filtering is meaningful. The same applies to test-coreml-delegate, test-static-llama-ane, test-models-macos-coreml, test-models-macos-mps, and test-huggingface-transformers-macos.

If the intent is "on push: only run on sampled commits (25%)" regardless of paths touched, then this is correct behavior — just know that path filtering has no effect on push events with the current _get-changed-files.yml implementation. If you want path filtering to also work on push (run on sampled OR path-matching pushes), you'd need _get-changed-files.yml to compute the diff against the previous commit for push events too.

3. viable-strict-gate tag trigger semantics (Verify)

viable-strict-gate.yml:25-26 triggers on tags: ciflow/trunk/*. When promote-to-viable-strict.yml pushes a ciflow/trunk/<sha> tag, this triggers a new viable-strict-gate run. The new run's github.sha will be the commit the tag points to, and the _ci-run-decision.yml will see REF=refs/tags/ciflow/trunk/... which matches the refs/tags/ciflow/* case and returns is-full-run=true. This path is sound.

However, verify that the actions/checkout@v4 in _ci-run-decision.yml (line 43) correctly checks out the tagged commit (not HEAD of the default branch) when triggered by a tag push. With fetch-depth: 0 and no explicit ref:, checkout defaults to github.sha which should be the tagged commit. This should work but is worth a smoke test.

4. Promote workflow — git branch -r parsing (Nit)

promote-to-viable-strict.yml:75:

for branch in main $(git branch -r | grep -E 'origin/release/' | sed 's|origin/||'); do

git branch -r output has leading whitespace (e.g., origin/release/0.6). The sed removes origin/ but not the leading spaces, so the variable would be release/0.6. Then origin/$branch becomes origin/ release/0.6 which won't resolve. Consider adding | tr -d ' ' or using sed 's|^[[:space:]]*origin/||'.

Fix this →

Design Observations (Non-blocking)

--first-parent in git rev-list --count — good fix from previous review. This guarantees exactly 1-in-4 main pushes are sampled regardless of PR merge sizes.
permissions: {} on viable-strict-gate.yml and permissions: contents: read on _ci-run-decision.yml — properly locked down.
Promotion workflow defense-in-depth (deleting failed gate runs) is a nice touch for HUD cleanliness.
mlx.yml gating pattern (github.event_name == 'pull_request' || is-full-run == 'true') is clean and consistent across all jobs.
Concurrency control (cancel-in-progress: false) on promote workflow prevents race conditions.

Summary

The PR is well-structured and the core mechanism is correct. The main actionable items are:

Fix the misleading comment about contains('*', ...) behavior (risk of future misunderstanding)
Fix whitespace handling in the promote workflow's branch parsing (functional bug)

The path-filter-on-push semantic (issue #2) is a design choice that's fine as-is if the intent is "push events are either full-run or skip" — just worth documenting explicitly.

metascroy · 2026-06-01T21:50:35Z

@claude review this code

claude · 2026-06-01T21:51:20Z

Claude finished @metascroy's task in 1m 31s —— View job

Code Review: Reduce CI Cost

Read changed workflow files
Analyze CI run decision logic
Analyze viable-strict-gate workflow
Analyze promote-to-viable-strict workflow
Review pull.yml and trunk.yml changes
Review mlx.yml changes
Review update-viablestrict.yml changes
Post review findings

Overall Assessment

This is a well-architected CI cost reduction. The depth-based deterministic 25% sampling centralized in _ci-run-decision.yml, the viable-strict gate preventing partial-signal advancement, and the manual promotion escape hatch all work together cleanly. Previous review feedback (ungated test-llama-torchao-lowbit, --first-parent, permissions: {}, branch parsing) has been addressed. The code is in good shape.

Issues

1. _get-changed-files.yml Compare API pagination may produce incomplete results (Medium)

_get-changed-files.yml:76 — The GitHub Compare API (/compare/BASE...HEAD) has a hard limit of 300 files in its response regardless of pagination (GitHub docs). If a push commit (e.g., a large merge) touches >300 files, the diff will silently be truncated. Jobs whose relevant paths are in the un-returned tail won't run even though they should.

This is unlikely for typical single-commit pushes to main, but could happen for merge commits or large automated updates. If it matters, you could add a check: if the response indicates total_commits or file count is at the limit, fall back to '*'.

2. _ci-run-decision.yml checkout ref on tag pushes (Low — verify)

_ci-run-decision.yml:43-45 — When viable-strict-gate.yml triggers from a ciflow/trunk/* tag push, actions/checkout@v4 with no explicit ref: defaults to github.sha, which should be the tagged commit. But on tag push events, github.sha is the commit the tag points to — so git rev-list --first-parent --count "$SHA" should resolve correctly. Worth a smoke test, but likely fine.

3. Race between _get-changed-files and force-push (Unlikely, informational)

If a force-push to main replaces commits (unusual for ExecuTorch), github.event.before could reference a SHA that no longer exists in the linear history. The Compare API may 404, and the || echo "" fallback would produce empty CHANGED_FILES, which then becomes '*' (line 82-84). This is actually safe — it falls through to "full" behavior. Just noting the implicit safety net works correctly.

Design Observations (Non-blocking)

include-push-diff: true opt-in is a nice API design. Callers like lint.yml that rely on '*' don't need changes.
The comment at pull.yml:18-23 and trunk.yml:22-27 is now accurate: it correctly explains that contains('*', 'path') is false and jobs rely on is-full-run.
promote-to-viable-strict.yml branch parsing now uses git for-each-ref instead of git branch -r, avoiding the whitespace issue from the previous review.
Job gating pattern is consistent across all three workflows (mlx.yml, trunk.yml, pull.yml):
- Unconditionally-gated jobs: github.event_name == 'pull_request' || is-full-run == 'true'
- Path-filtered jobs: contains(...) || ... || is-full-run == 'true'
viable-strict-gate and update-viablestrict — adding "viable-strict-gate" to the requires list is the correct integration point with PyTorch's HUD infrastructure.

Summary

The PR is clean and ready. The only potentially actionable item is the 300-file Compare API limit (#1), which is a known GitHub limitation and unlikely to matter in practice for this repo's merge patterns. All previously identified issues have been addressed. The design achieves its ~75% macOS runner cost savings goal with good ergonomics for maintainers.

viable-strict-gate.yml and mlx.yml had permissions: {} but call _ci-run-decision.yml, which needs contents: read for actions/checkout. GitHub intersects caller permissions with callee needs ({} ∩ {contents: read} = {}), so both workflows were rejected at registration since #19919 landed. The gate hasn't run (so update-viablestrict has had no signal), and mlx.yml hasn't triggered (MLX / * checks show [does not exist] on HUD). Loosen both to permissions: contents: read. Audited all other callers of _ci-run-decision.yml / _get-changed-files.yml; none affected.

up

5f1d953

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 1, 2026

up

2d0350d

github-advanced-security AI found potential problems Jun 1, 2026

View reviewed changes

Comment thread .github/workflows/_ci-run-decision.yml Fixed

Comment thread .github/workflows/viable-strict-gate.yml Fixed

Comment thread .github/workflows/viable-strict-gate.yml Fixed

digantdesai reviewed Jun 1, 2026

View reviewed changes

digantdesai approved these changes Jun 1, 2026

View reviewed changes

up

1e4a688

up

c5f6fb3

up

964dc8a

metascroy merged commit 410f930 into main Jun 1, 2026
169 checks passed

metascroy deleted the reduce-ci-costs branch June 1, 2026 21:54

metascroy mentioned this pull request Jun 1, 2026

Fix: permissions: {} blocks reusable workflow calls #19923

Merged

		contains(needs.changed-files.outputs.changed-files, '.github/workflows/trunk.yml') \|\|
		needs.run-decision.outputs.is-full-run == 'true'

Conversation

metascroy commented Jun 1, 2026

Uh oh!

pytorch-bot Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19919

Uh oh!

github-actions Bot commented Jun 1, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Uh oh!

Uh oh!

metascroy commented Jun 1, 2026

Uh oh!

claude Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Reduce CI Cost

Overall Assessment

Issues

Design Observations (Non-blocking)

Summary

Uh oh!

digantdesai Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

metascroy Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

digantdesai Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

metascroy Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

digantdesai Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

metascroy Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

digantdesai left a comment

Choose a reason for hiding this comment

Uh oh!

metascroy commented Jun 1, 2026

Uh oh!

claude Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Reduce CI Cost

Overall Assessment

Issues

Design Observations (Non-blocking)

Summary

Uh oh!

metascroy commented Jun 1, 2026

Uh oh!

claude Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Reduce CI Cost

Overall Assessment

Issues

Design Observations (Non-blocking)

Summary

Uh oh!

metascroy commented Jun 1, 2026

Uh oh!

claude Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Reduce CI Cost

Overall Assessment

Issues

Design Observations (Non-blocking)

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

pytorch-bot Bot commented Jun 1, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Jun 1, 2026 •

edited

Loading

claude Bot commented Jun 1, 2026 •

edited

Loading

claude Bot commented Jun 1, 2026 •

edited

Loading

claude Bot commented Jun 1, 2026 •

edited

Loading