Add Storage TSG: storage pool capacity threshold warning (fixed vs thin volumes)#300
Add Storage TSG: storage pool capacity threshold warning (fixed vs thin volumes)#300AlBurns-MSFT wants to merge 3 commits into
Conversation
Adds a Storage TSG explaining the S2D storage pool capacity threshold warning and the supported remediation paths, gated on volume provisioning type: - Step 1 decision gate on ProvisioningType (Fixed vs Thin) to prevent running the thin reclamation procedure on fixed volumes, where it is a no-op. - Path A (Fixed): add disks, convert to thin, shrink/remove, suppress alert, or raise threshold -- each risk-labeled. - Path B (Thin): SlabConsolidate + ReFS unmap reclamation procedure (no -ReTrim on ReFS), with VM-suspend and CSV owner-node guidance. - Clarifies the two distinct threshold controls and the reserve-capacity rationale (repair headroom after drive/node loss). Registers the TSG in TSG/Storage/README.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new Storage troubleshooting guide (TSG) to explain and remediate the S2D storage pool capacity threshold warning, with remediation explicitly gated on volume provisioning type (Fixed vs Thin) to avoid ineffective thin-reclamation steps on fixed volumes.
Changes:
- Introduces a new TSG covering why the warning exists, how to determine provisioning type, and safe remediation options for Fixed vs Thin volumes.
- Documents thin-volume reclamation guidance including VM workload handling, CSV owner-node execution guidance, and ReFS-specific notes.
- Registers the new TSG in the Storage README index.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
TSG/Storage/Troubleshoot-Storage-StoragePoolCapacityThreshold.md |
New TSG describing the warning, decision gate on provisioning type, remediation paths, and verification steps. |
TSG/Storage/README.md |
Adds the new TSG link to the Storage index. |
| The single most important step is to **determine whether the affected volumes are | ||
| fixed or thin provisioned before taking any action**, because the space | ||
| reclamation procedure (`Optimize-Volume -SlabConsolidate` followed by | ||
| `Optimize-StoragePool`) **does nothing on a fixed-provisioned volume** and only | ||
| applies to thin-provisioned volumes. |
| If the customer accepts the capacity posture and wants to stop the alert, the | ||
| Health Service threshold alert can be disabled: |
| Get-StorageJob | ||
|
|
||
| # Health faults | ||
| Get-StorageSubSystem -FriendlyName Clus* | Debug-StorageSubSystem |
|
Great addition! |
… Get-HealthFault - Overview: describe thin reclamation as SlabConsolidate + ReFS background unmap (Optimize-StoragePool is an optional rebalance, not the freeing step) to match Path B; align the summary table row too. - Option A4: note the Health Service alert toggle is applied at the storage subsystem level, so it suppresses the threshold alert cluster-wide (all pools), not just one pool or volume. - Verify: use the lightweight Get-HealthFault to list active faults instead of the heavier Debug-StorageSubSystem, matching the other Storage TSGs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Thanks for the review — addressed all three points in e25447e:
|
1008covingtonlane
left a comment
There was a problem hiding this comment.
Reviewed for technical accuracy against Microsoft Learn — high-quality TSG, and the Fixed-vs-Thin decision gate (don't run Optimize-Volume -SlabConsolidate on a fixed volume) is exactly the right framing; it's a frequent real-world misread.
Verified correct against Learn:
- 70% default thin-provisioning alert threshold.
- ~15-minute ReFS background-unmap reclamation wait after
-SlabConsolidate. - Cmdlets:
Set-StoragePool -ThinProvisioningAlertThresholds,Set-VirtualDisk -ProvisioningType Thin,Get-VirtualDisk … ProvisioningType. -SlabConsolidate(not-ReTrim) on ReFS; Arc-VM stop-from-Azure rather than hostSuspend-VM; CSV owner-node / by-path handling;Optimize-StoragePoolas rebalance, not the slab-free mechanism.- Every remediation carries a risk label; A4's subsystem-wide alert scope is clearly warned.
One substantive question inline on Option A2's "23H2 (build 2311.2) or later" conversion requirement — it appears to contradict the 22H2 conversion doc this TSG links. Everything else looks accurate and well-organized. Nice addition.
| > [!IMPORTANT] | ||
| > In-place conversion of an existing fixed volume to thin requires **Azure Local | ||
| > 23H2 (build 2311.2) or later**. On earlier releases, do not attempt the | ||
| > conversion — instead create a new thin volume and migrate the data, then remove |
There was a problem hiding this comment.
Thanks for the careful Fixed-vs-Thin gating throughout — this is a genuinely useful TSG.
One question on this build requirement: I couldn't find a 2311.2 floor for in-place fixed→thin conversion in either Learn doc this TSG links. The conversion doc cited here (Convert fixed to thin provisioned volumes) documents the in-place Set-VirtualDisk -ProvisioningType Thin + remount flow on Azure Stack HCI 22H2, and the current Storage thin provisioning in Azure Local, version 23H2 doc says conversion is supported with no build gate.
If there's a known regression/issue that makes 2311.2 the real floor, it would be great to cite it here. Otherwise the "on earlier releases, do not attempt the conversion" guidance may send engineers down an unnecessary new-volume-and-migrate path when in-place conversion is documented as supported on 22H2. Happy to be corrected if you've hit a specific build issue.
There was a problem hiding this comment.
Thanks — removing the unvalidated 2311.2 claim.
There was a problem hiding this comment.
Verified — the revised note is accurate: the Set-VirtualDisk -ProvisioningType Thin + remount procedure is archived under /previous-versions/ due to the Azure Stack HCI → Azure Local rename (not a feature removal), and there's no documented minimum build. The "confirm on the current build, otherwise create-new-and-migrate" caveat is a sensible call. Thanks for the quick turnaround!
Option A2 stated that in-place fixed-to-thin conversion "requires Azure
Local 23H2 (build 2311.2) or later" and to not attempt it on earlier
releases. That build floor has no basis in any Microsoft source:
- The cited conversion doc ("Convert fixed to thin provisioned volumes")
states "Applies to: Azure Stack HCI, version 22H2" and documents the
in-place Set-VirtualDisk -ProvisioningType Thin flow with no build gate.
- The thin-provisioning concept doc FAQ ("Can existing fixed volumes be
converted to thin? Yes ... supported") notes the feature is available
since Azure Stack HCI 21H2 -- below the claimed floor.
- The current Azure Local known-issues/release notes carry no regression
tying conversion to 2311.2 or any 23H2 build.
- The /previous-versions/ path reflects the Azure Stack HCI -> Azure Local
rename (archival), not feature deprecation.
Replace the fabricated build floor with the accurate, sourced caveat: no
minimum build is published; the procedure is documented for 21H2/22H2 and
archived; it is not re-published in current 23H2/24H2 volume docs, so
confirm current-build support before recommending, and fall back to
new-volume-and-migrate only if support cannot be confirmed.
Addresses PR review comment by 1008covingtonlane.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1008covingtonlane
left a comment
There was a problem hiding this comment.
Approving. Verified the technical content against Microsoft Learn — the 70% default thin-provisioning alert threshold, the ~15-minute ReFS background-unmap reclamation wait, -SlabConsolidate (not -ReTrim) on ReFS, Arc-VM stop-from-Azure, and the CSV owner-node / by-path handling are all accurate. The Copilot bot's three points and the fixed-to-thin build-floor claim are now resolved. Clear, well-structured TSG and a genuinely useful one for the storage-pool capacity threshold misread.
What
Adds a Storage TSG explaining the Storage Spaces Direct (S2D) storage pool capacity threshold warning and the supported remediation paths, gated on volume provisioning type (fixed vs thin).
The warning is not a false alarm — S2D needs free pool capacity in reserve so repair jobs can rebuild resiliency after a drive/node loss — but it is frequently misunderstood on clusters using fixed-provisioned volumes, where the pool can sit above the threshold even when the volume's file system is mostly empty.
Highlights
ProvisioningType(Fixed vs Thin) to prevent running the thin reclamation procedure on fixed volumes, where it is a no-op.Optimize-Volume -SlabConsolidate+ ReFS unmap reclamation (no-ReTrimon ReFS), with VM-suspend and CSV owner-node guidance.Files
TSG/Storage/Troubleshoot-Storage-StoragePoolCapacityThreshold.md— new TSGTSG/Storage/README.md— registers the TSG in the Storage index