COMP: Reduce Linux CI disk pressure and harden ccache management#6478
COMP: Reduce Linux CI disk pressure and harden ccache management#6478hjmjohnson wants to merge 4 commits into
Conversation
|
| Filename | Overview |
|---|---|
| Testing/ContinuousIntegration/AzurePipelinesLinux.yml | Moves CCACHE_MAXSIZE to pipeline variables, adds Free preinstalled software step, hardens eviction logic with 3d/4d windows and |
| Testing/ContinuousIntegration/AzurePipelinesLinuxPython.yml | Replaces the old CCACHE_MAXSIZE=6.5G override workaround with a clean pipeline-variable declaration (5G), aligns eviction pattern with Linux.yml, adds Free preinstalled software step; same set -e + unguarded ccache -c issue in the new cleanup step. |
| Testing/ContinuousIntegration/AzurePipelinesMacOSPython.yml | Adds macOS-specific Free preinstalled software step (dotnet + CoreSimulator runtimes), moves CCACHE_MAXSIZE to pipeline variables, adds eviction logic and cleanup step; same set -e + unguarded ccache -c fragility as Linux files. |
| Testing/ContinuousIntegration/AzurePipelinesBatch.yml | Moves CCACHE_MAXSIZE to pipeline variables; adds a new cleanup step using $AGENT_JOBSTATUS for eviction, ccache -c, and rm -rf with |
| Testing/ContinuousIntegration/AzurePipelinesWindowsPython.yml | Moves CCACHE_MAXSIZE to pipeline variables and adds cleanup step with $AGENT_JOBSTATUS-based eviction, ccache -c, and |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["Free preinstalled software\nrm android/ghc/dotnet/swift/boost\ndocker image prune -af"] --> B["checkout + install dependencies"]
B --> C["Cache@2 restore ccache + ExternalData"]
C --> D["ccache zero-stats, evict-older-than 7d, show-config"]
D --> E["Build and test\nctest -S dashboard.cmake"]
E --> F{ctest_rc == 0?}
F -- success --> G["ccache evict-older-than 3d or true"]
F -- failure --> H["ccache evict-older-than 4d or true"]
G --> I["exit ctest_rc"]
H --> I
I --> J["ccache show-stats - condition always"]
J --> K["Diagnostics + JUnit + Publish results"]
K --> L["Free build tree - condition always\nccache -c\nrm -rf build tree + ITK-dashboard\ndf -h /"]
L --> M["Cache@2 post-job save"]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A["Free preinstalled software\nrm android/ghc/dotnet/swift/boost\ndocker image prune -af"] --> B["checkout + install dependencies"]
B --> C["Cache@2 restore ccache + ExternalData"]
C --> D["ccache zero-stats, evict-older-than 7d, show-config"]
D --> E["Build and test\nctest -S dashboard.cmake"]
E --> F{ctest_rc == 0?}
F -- success --> G["ccache evict-older-than 3d or true"]
F -- failure --> H["ccache evict-older-than 4d or true"]
G --> I["exit ctest_rc"]
H --> I
I --> J["ccache show-stats - condition always"]
J --> K["Diagnostics + JUnit + Publish results"]
K --> L["Free build tree - condition always\nccache -c\nrm -rf build tree + ITK-dashboard\ndf -h /"]
L --> M["Cache@2 post-job save"]
Comments Outside Diff (1)
-
Testing/ContinuousIntegration/AzurePipelinesLinux.yml, line 152-154 (link)ccache --show-statsruns before compaction, so reported size is pre-compactThe
ccache --show-statsstep runs before the "Free build tree" step that runsccache -c. Disk usage metrics visible in the logs will reflect the state after eviction but before compaction, which can look larger than the final cache thatCache@2actually archives. The same ordering exists inAzurePipelinesLinuxPython.ymlandAzurePipelinesMacOSPython.yml. A secondccache --show-statsat the end of the cleanup step would let operators confirm the final archived cache size.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Reviews (1): Last reviewed commit: "COMP: Apply ccache and disk-cleanup hard..." | Re-trigger Greptile
Replace the misplaced 'Free disk before post-job ccache upload' step (which ran before diagnostic steps, used ninja clean, and patched CCACHE_MAXSIZE with an env-var override) with the same pattern already used by AzurePipelinesLinux.yml: - Reduce CCACHE_MAXSIZE from 8G to 5G so the post-job Cache@2 tar needs less headroom to write its archive. - Add conditional ccache eviction inside the build step: 1d on success (keeps only the warm entries), 4d on failure (retains fix-retry objects across runs). - Move build-tree removal to after the diagnostic/JUnit/publish steps and replace ninja clean with rm -rf, freeing static libs, .so modules, and generated .cxx sources that ninja clean leaves behind.
Ubuntu-22.04 and ubuntu-24.04 hosted agents ship Android SDK (~9 GB), Haskell/GHCup (~5 GB), .NET (~2-3 GB), Swift (~1.5 GB), CodeQL (~2 GB), and Boost headers (~1.2 GB). ITK's Linux builds use none of these; removing them at job start recovers ~20 GB before checkout, ccache restore, and the build itself consume disk.
- Move CCACHE_MAXSIZE to the pipeline variables block so ccache --show-config and --show-stats see the operative limit (was scoped only to the build step) - Add ccache -c to the always()-conditioned cleanup step so restored oversized caches are compacted before Cache@2 tars them - Extend success eviction window 1d → 3d; ccache does not refresh mtimes on hits, so the 1d window was evicting warm entries that were used this build - Delete false comment "ccache refreshes timestamps on hit" from Linux.yml - Add set -e to "Free preinstalled software" and cleanup steps; add || true to docker image prune so daemon absence is non-fatal - Fix boost removal path to include headers and libs (/usr/local/include/boost, /usr/local/lib/libboost_*); /usr/local/share/boost held only CMake configs - Remove $(Agent.BuildDirectory)/ITK-dashboard clone in cleanup step - Use df -h / consistently (was bare df -h in "Free preinstalled software") - Mark ccache eviction calls with || true to surface that cache maintenance failure does not override the build exit code
Mirrors the fixes from the Linux pipelines (ccache -c, 3d/4d eviction, CCACHE_MAXSIZE in variables block, build-tree cleanup) to the remaining three Azure pipeline configurations. macOS (AzurePipelinesMacOSPython.yml): - Add "Free preinstalled software" step: remove .NET SDK and iOS simulator runtimes before checkout - Move CCACHE_MAXSIZE: 8G to variables block - Add ctest_rc/3d/4d eviction pattern to build step - Add condition: always() cleanup: ccache -c, rm build tree and ITK-dashboard clone, df -h / Windows Python (AzurePipelinesWindowsPython.yml): - Move CCACHE_MAXSIZE: 8G to variables block - Add condition: always() cleanup (bash via Git Bash): evict 3d/4d via $AGENT_JOBSTATUS, ccache -c, rm build tree and ITK-dashboard Batch Windows (AzurePipelinesBatch.yml): - Move CCACHE_MAXSIZE: 2.4G to variables block - Same condition: always() cleanup as Windows Python
a9bf82f to
6886b30
Compare
Reduce Linux CI disk pressure: free preinstalled software, compact ccache before
Cache@2 tar, harden cleanup steps. Fixes recurring near-ENOSPC in ITK.Linux.Python.
Commit summary
Commit 1 —
COMP: Align Linux.Python CI disk management with Linux CI patternAzurePipelinesLinux.yml: conditional 1d/4d ccache eviction inside the build
step, build-tree removal after diagnostic steps.
Commit 2 —
COMP: Free unused preinstalled software in Linux Azure CI jobsLinux pipelines: remove Android SDK, GHC/GHCUP, .NET, Swift, CodeQL, Boost.
Commit 3 —
COMP: Harden Linux CI ccache and disk-cleanup after code reviewccache -cto thecondition: always()cleanup step so a restoredoversized cache is compacted before Cache@2 archives it (the missing step
that would let ENOSPC recur on the first post-merge run).
on cache hits, so the 1d window was evicting warm entries used this build.
ccache --show-configand
ccache --show-statsreport the operative limit.set -eto "Free preinstalled software" and cleanup steps.docker image prune -afwith|| true(daemon absence is non-fatal)./usr/local/include/boost) andlibraries (
/usr/local/lib/libboost_*); the old path only removed CMakefind-package configs (~1 MB, not ~200 MB).
rm -rf $(Agent.BuildDirectory)/ITK-dashboardto cleanup step.df -h /consistently (was baredf -hin "Free preinstalled software").|| trueto express that maintenance failure doesnot override the build exit code.