docs(gpu): drop manual KubeVirt patch step now that the platform auto-wires permittedHostDevices#556
docs(gpu): drop manual KubeVirt patch step now that the platform auto-wires permittedHostDevices#556Aleksei Sviridkin (lexfrei) wants to merge 2 commits into
Conversation
…-wires permittedHostDevices Step 2 of the GPU Passthrough guide instructed operators to `kubectl edit kubevirt -n cozy-kubevirt` and hand-paste a permittedHostDevices.pciHostDevices block. cozystack/cozystack#2768 removes the need for that step: when cozystack.gpu-operator is in bundles.enabledPackages, the platform now mirrors the chosen GPU variant into the KubeVirt CR automatically — appending HostDevices to the feature-gate list and rendering a starter NVIDIA pciHostDevices table covering Hopper, Ada Lovelace, Ampere, Turing and Volta. The new step 2 documents the contract (what the platform auto-injects and why), the verification recipe, the escape hatch via .gpu.permittedHostDevices / .gpu.replaceDefaults, and the manual Package-CR override path used by operators who need overrides the bundle does not expose (driver settings, custom node selectors, validator / dcgmExporter tweaks) — in that flow they also hand-craft the matching cozystack.kubevirt Package CR. Only next/virtualization/gpu.md is updated; v1.4 and earlier describe releases that still require the manual patch and stay as-is. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
✅ Deploy Preview for cozystack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughThis PR updates GPU configuration documentation for KubeVirt, replacing manual CR editing instructions with automated management via the ChangesGPU Configuration Workflow
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request updates the GPU virtualization documentation to explain that Cozystack now automatically configures and wires KubeVirt when the GPU operator is enabled. It details the automatic injection of host devices, how to extend or replace NVIDIA defaults, and the manual Package-CR override path. The review feedback suggests improving command portability by replacing yq with jq in the verification step, and correcting the configuration path from components.kubevirt.values to spec.values for standalone Package CRs.
| kubectl -n cozy-kubevirt get kubevirt kubevirt -o yaml \ | ||
| | yq '.spec.configuration | {featureGates: .developerConfiguration.featureGates, permittedHostDevices: .permittedHostDevices}' |
There was a problem hiding this comment.
Using yq can sometimes lead to compatibility issues depending on whether the user has the Python-based yq (which supports full jq syntax) or the Go-based yq (which has a different expression syntax) installed.
Using kubectl ... -o json | jq ... is much more portable, standard, and guaranteed to work across different environments since jq is universally standardized.
| kubectl -n cozy-kubevirt get kubevirt kubevirt -o yaml \ | |
| | yq '.spec.configuration | {featureGates: .developerConfiguration.featureGates, permittedHostDevices: .permittedHostDevices}' | |
| kubectl -n cozy-kubevirt get kubevirt kubevirt -o json \ | |
| | jq '.spec.configuration | {featureGates: .developerConfiguration.featureGates, permittedHostDevices: .permittedHostDevices}' |
|
|
||
| ### Manual Package-CR override path | ||
|
|
||
| If you opt out of bundle management and hand-craft a `cozystack.gpu-operator` Package CR directly (to apply overrides the bundle does not expose — driver settings, custom node selectors, validator / dcgmExporter tweaks), the platform does NOT auto-wire `HostDevices` or `permittedHostDevices` into the KubeVirt CR. In that flow, mirror the bundle behaviour by also creating a `cozystack.kubevirt` Package CR with `components.kubevirt.values.extraFeatureGates: [HostDevices]` and the appropriate `permittedHostDevices` block. The manual Package-CR override path takes precedence over the bundle render whenever both exist. |
There was a problem hiding this comment.
When creating a standalone cozystack.kubevirt Package CR directly, the configuration values should be defined under spec.values rather than components.kubevirt.values. The components.<name>.values structure is used when configuring components within the umbrella cozystack-platform package.
Updating this path ensures the standalone Package CR is configured correctly.
| If you opt out of bundle management and hand-craft a `cozystack.gpu-operator` Package CR directly (to apply overrides the bundle does not expose — driver settings, custom node selectors, validator / dcgmExporter tweaks), the platform does NOT auto-wire `HostDevices` or `permittedHostDevices` into the KubeVirt CR. In that flow, mirror the bundle behaviour by also creating a `cozystack.kubevirt` Package CR with `components.kubevirt.values.extraFeatureGates: [HostDevices]` and the appropriate `permittedHostDevices` block. The manual Package-CR override path takes precedence over the bundle render whenever both exist. | |
| If you opt out of bundle management and hand-craft a `cozystack.gpu-operator` Package CR directly (to apply overrides the bundle does not expose — driver settings, custom node selectors, validator / dcgmExporter tweaks), the platform does NOT auto-wire `HostDevices` or `permittedHostDevices` into the KubeVirt CR. In that flow, mirror the bundle behaviour by also creating a `cozystack.kubevirt` Package CR with `spec.values.extraFeatureGates: [HostDevices]` and the appropriate `permittedHostDevices` block. The manual Package-CR override path takes precedence over the bundle render whenever both exist. |
Andrei Kvapil (kvaps)
left a comment
There was a problem hiding this comment.
Requesting changes on one thing: keep a discoverable "GPU not in the default table" escape hatch — but route it through .gpu.permittedHostDevices, not kubectl edit. The rest of the rewrite is good.
I'd rather we not leave operators without a visible manual path. Two points:
-
The reconcile-safe manual path already lives in this PR — the "Extending or replacing the NVIDIA defaults" section (
.gpu.permittedHostDevices+replaceDefaults). That's the right answer for a card not in the static table, and it survives reconcile because it flows through platform values → the KubeVirt CR template. My only ask is to make it more discoverable — e.g. a short FAQ entry / callout titled "My GPU isn't in the default table" that links to it, since operators upgrading from the old flow will look for the removedkubectl editstep. -
Please don't reinstate the old
kubectl edit kubevirtstep verbatim behind a spoiler. Post-auto-wiring that field is owned by the chart template, so a hand edit is reverted on the next Flux/Helm reconcile — keeping it as-is would be a footgun. If we show the raw CR shape at all, it should be explicitly labelled "reference only —permittedHostDevicesis reconciled from platform values; edit.gpu.permittedHostDevicesinstead" inside the collapsible.
This ties into the upgrade-safety request on the platform side — cozystack/cozystack#2768 (and the migration breakdown in cozystack/cozystack#2768 (comment)): operators who hand-edited permittedHostDevices need a clear, persistent migration target, and .gpu.permittedHostDevices is it. Worth surfacing the same upgrade note here too.
Net: keep the manual capability, just anchor it on the persistent knob and make it easy to find.
…ostDevices The bundle now owns spec.configuration.permittedHostDevices, so the first reconcile after upgrade overwrites manual kubectl-edit entries with the NVIDIA default table. Tell operators to move custom entries into .gpu.permittedHostDevices and verify each resourceName against node-advertised names before upgrading, since the default slugs (e.g. TU104GL_T4) differ from legacy names (e.g. TU104GL_TESLA_T4) and a mismatch silently rejects GPU VMs. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
content/en/docs/next/virtualization/gpu.md (1)
147-150: ⚡ Quick winInconsistent kubectl command pattern.
This command uses
kubectl get kubevirt -n cozy-kubevirt -o yamlwithout specifying the resource name, then indexes into.items[0]. However, line 115 useskubectl get kubevirt kubevirtwith the explicit resource name, which returns the object directly without needing.items[]indexing.For consistency and clarity, use the same pattern as line 115:
📝 Suggested fix for consistency
- kubectl get kubevirt -n cozy-kubevirt -o yaml \ - | yq '.items[0].spec.configuration.permittedHostDevices' + kubectl -n cozy-kubevirt get kubevirt kubevirt -o yaml \ + | yq '.spec.configuration.permittedHostDevices'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@content/en/docs/next/virtualization/gpu.md` around lines 147 - 150, The kubectl command uses the list-style invocation and then indexes into .items[0], which is inconsistent with the explicit resource call used earlier; update the command so it targets the specific KubeVirt resource name (same pattern as the earlier `kubectl get kubevirt kubevirt`) and remove the need for `.items[0]` when extracting `.spec.configuration.permittedHostDevices` to keep command style consistent and clearer.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@content/en/docs/next/virtualization/gpu.md`:
- Line 110: Update the wording around resourceName in the
spec.configuration.permittedHostDevices.pciHostDevices paragraph to reflect the
actual slug format used by nvidia-sandbox-device-plugin (v25.x): state that
resourceName slugs are typically two-component identifiers like
`nvidia.com/GA102GL_A10` or `nvidia.com/TU104GL_T4` and clarify that optional
`<form>` and `<mem>` components may be appended for more specific devices (i.e.,
`<arch>_<model>` is the common case, with optional `_ <form>_ <mem>` when
present); keep the note about externalResourceProvider: true and mention the
plugin as the source of these resource names.
---
Nitpick comments:
In `@content/en/docs/next/virtualization/gpu.md`:
- Around line 147-150: The kubectl command uses the list-style invocation and
then indexes into .items[0], which is inconsistent with the explicit resource
call used earlier; update the command so it targets the specific KubeVirt
resource name (same pattern as the earlier `kubectl get kubevirt kubevirt`) and
remove the need for `.items[0]` when extracting
`.spec.configuration.permittedHostDevices` to keep command style consistent and
clearer.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cc18be5d-d493-4e16-a7de-27048f475ce7
📒 Files selected for processing (1)
content/en/docs/next/virtualization/gpu.md
| Specifically, the platform injects: | ||
|
|
||
| - `HostDevices` into `spec.configuration.developerConfiguration.featureGates` (current KubeVirt splits this from the `GPU` gate; the admission webhook rejects `domain.devices.hostDevices` without it). | ||
| - A starter `spec.configuration.permittedHostDevices.pciHostDevices` table covering common NVIDIA datacenter GPUs — Hopper (H100, H200), Ada Lovelace (L4, L40, L40S), Ampere (A100 PCIe/SXM, A40, A30, A10), Turing (T4), Volta (V100, V100S). PCI vendor:device pairs are stable; `resourceName` slugs follow the `<arch>_<model>_<form>_<mem>` convention `nvidia-sandbox-device-plugin` v25.x emits (e.g. `nvidia.com/GA102GL_A10`). `externalResourceProvider: true` is set on every entry because the resources are advertised by the sandbox plugin, not by KubeVirt's in-tree device plugin. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check the actual resource naming pattern in nvidia-sandbox-device-plugin documentation
# Search for resourceName examples and naming documentation
rg -C5 'resourceName.*nvidia\.com' --type=md
# Also check for any naming convention documentation
rg -C3 'arch.*model.*form.*mem|naming.*convention' --type=mdRepository: cozystack/website
Length of output: 42109
Fix resourceName naming-convention wording (line 110)
- Line 110 states
resourceNameslugs follow<arch>_<model>_<form>_<mem>, but the document’s examples only use two components (e.g.,nvidia.com/GA102GL_A10,nvidia.com/TU104GL_T4). Update the convention text to match the actual slug format (and clarify when/if<form>and<mem>are present) or adjust the examples accordingly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@content/en/docs/next/virtualization/gpu.md` at line 110, Update the wording
around resourceName in the
spec.configuration.permittedHostDevices.pciHostDevices paragraph to reflect the
actual slug format used by nvidia-sandbox-device-plugin (v25.x): state that
resourceName slugs are typically two-component identifiers like
`nvidia.com/GA102GL_A10` or `nvidia.com/TU104GL_T4` and clarify that optional
`<form>` and `<mem>` components may be appended for more specific devices (i.e.,
`<arch>_<model>` is the common case, with optional `_ <form>_ <mem>` when
present); keep the note about externalResourceProvider: true and mention the
plugin as the source of these resource names.
Companion to cozystack/cozystack#2768.
Rewrites step 2 of the GPU Passthrough guide. Until now the page instructed operators to run
kubectl edit kubevirt -n cozy-kubevirtand hand-paste apermittedHostDevices.pciHostDevicesblock — that is the friction that ticket #2765 asked the platform to remove. With cozystack/cozystack#2768 landed, the bundle mirrors the chosen GPU variant into the KubeVirt CR automatically:HostDevicesis appended to the feature-gate list and a starter NVIDIApciHostDevicestable (Hopper, Ada Lovelace, Ampere, Turing, Volta) is rendered alongside the operator's.gpu.permittedHostDevicesextensions.The new step 2 documents:
HostDevicesgate, NVIDIA default table,externalResourceProvider: truesemantics).kubectl -n cozy-kubevirt get kubevirt kubevirt -o yaml | yq ...).gpu.replaceDefaults,gpu.permittedHostDevices.pciHostDevices, plus the consequence ofreplaceDefaults: truewith an empty list (no admittable GPU VMs).cozystack.gpu-operatoroutside the bundle for advanced overrides, they also hand-craftcozystack.kubevirtwith the matchingextraFeatureGates/permittedHostDevices. The manual override takes precedence over the bundle render.Only
next/virtualization/gpu.mdis touched. The released doc versions (v1.4and earlier) describe earlier Cozystack releases that still require the manualkubectl edit, and stay as-is.Release note
Summary by CodeRabbit