OTA-1966: pkg/readiness/crd_compat: Drop unnecessary CustomResourceDefinition check#1415
Conversation
|
@wking: This pull request references OTA-1966 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (7)
💤 Files with no reviewable changes (3)
🚧 Files skipped from review as they are similar to previous changes (4)
WalkthroughThe PR removes the CRD compatibility readiness check from the readiness registry and updates tests and fixtures to expect one fewer readiness check and no ChangesReadiness compatibility check removal
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/proposal/controller_test.go`:
- Around line 1359-1364: The aggregate readiness expectations in
controller_test.go are still asserting 9 checks even though the updated check
list in the readiness loop no longer includes crd_compat and RunAll() now
reports 8 total checks. Update the assertions around meta["total_checks"] and
meta["checks_ok"] to match the new 8-check aggregate, and verify the related
readiness metadata expectations stay consistent with the check names used in the
loop and RunAll().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 6d648dc8-afba-438c-8ecb-27c1a63781a5
📒 Files selected for processing (7)
pkg/proposal/controller_test.gopkg/readiness/check.gopkg/readiness/check_test.gopkg/readiness/checks_test.gopkg/readiness/client.gopkg/readiness/crd_compat.gotest/cvo/readiness.go
💤 Files with no reviewable changes (3)
- pkg/readiness/client.go
- pkg/readiness/crd_compat.go
- pkg/readiness/check.go
harche
left a comment
There was a problem hiding this comment.
Some context on why this existed since I was involved in adding it. The check came in with the readiness set in #1395. The intent was to catch CRDs where a stored version is no longer served, because that state can leave objects sitting in etcd that nothing can decode anymore, and the thought was to surface that to the admin before an upgrade so they could make an informed go or no-go call.
Having sat with it, I think it should go, but not because the signal is useless. The problem is it answers the wrong question. It scans the cluster as it is right now and flags CRDs that are already in the stranded state. Those objects are already unreadable today, with or without the upgrade, so it isn't really telling you whether this upgrade is safe, it's telling you something on your cluster is already broken and unrelated to the upgrade. It also never read current or target, it ignored both, which is another way of seeing that it was a present state hygiene scan rather than an upgrade readiness check.
The genuinely upgrade relevant question is the forward looking one. Will this upgrade strand any objects, which happens when the target release stops serving a version you still have objects stored in. Answering that means diffing the target release's CRD served versions against the cluster's current storedVersions. The deleted check did none of that. So if we want this signal back, that target aware version is the one worth building, and it would actually support the before hand decision this one was meant to.
One thing to fix first. This breaks TestGetProposals_WithReadinessData. Most of the 9 to 8 updates landed but the two count assertions in controller_test.go still expect 9, so go test ./pkg/proposal/... fails.
…heck We grew this check recently in b0e4c90 (pkg/readiness: Add readiness checks and wire into proposal controller, 2026-05-27, openshift#1395). But I don't think we need it. I agree with the guard, that having a CRD with storage versions that were not served would be weird, and probably not what the CRD maintainers intended. But what would a cluster-admin be expected to do about it? There's nothing in the code I'm removing here, or in https://github.com/openshift/agentic-skills as far as I can see, that turns "stored version no longer served" into an actionable plan. In the CRD-evolution space, OpenShift packages the Kube storage-version migrator. But we don't run the trigger logic [1]. And, as far as I can tell, the migration operator isn't checking to see if any StorageVersionMigrations are completed [2]. I expect that, if OpenShift ever decides to remove support for a previously-stored version, we'll set up something to run (and require the completion of) a StorageVersionMigration to ensure storage is off the outgoing version before we leave the last release where it was served. So that would protect OpenShift users from OpenShift-managed CRD evolution, without needing the guard I'm deleting In addition to OpenShift-managed CRDs, there might be many CRDs on an OpenShift cluster that are not part of the OpenShift release payload. They may have been installed by OLM-installed operators. They may have been installed by individual cluster users running 'oc apply ...'. Lots of other folks who could be maintaining CRDs on the cluster. We don't want to confuse admins who are considering an OpenShift version update by any issues that might be going on with those non-OpenShift CRDs. [1]: openshift/cluster-kube-storage-version-migrator-operator@807c909#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R7 [2]: https://github.com/kubernetes-sigs/kube-storage-version-migrator/blob/master/USER_GUIDE.md#check-if-migration-has-completed
f1868ad to
2377aaa
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: harche, wking The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/override ci/prow/e2e-aws-ovn-techpreview No consumers yet, without having OLS installed, and the unit-test coverage of our Proposal handling is enough to make me comfortable that's safe enough. /verified by presubmits passing |
|
@wking: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@wking: Overrode contexts on behalf of wking: ci/prow/e2e-aws-ovn-techpreview, ci/prow/e2e-hypershift-conformance DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@wking: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
We grew this check recently in b0e4c90 (#1395). But I don't think we need it. I agree with the guard, having a CRD with storage versions that were not served would be weird, and probably not what the CRD maintainers intended. But what would a cluster-admin be expected to do about it? There's nothing in the code I'm removing here, or in
https://github.com/openshift/agentic-skills as far as I can see, that turns
stored version no longer servedinto an actionable plan.In the CRD-evolution space, OpenShift packages the Kube storage-version migrator. But we don't run the trigger logic. And, as far as I can tell, the migration operator isn't checking to see if any StorageVersionMigrations are completed. I expect that, if OpenShift ever decides to remove support for a previously-stored version, we'll set up something to run (and require the completion of) a StorageVersionMigration to ensure storage is off the outgoing version before we leave the last release where it was served. So that would protect OpenShift users from OpenShift-managed CRD evolution, without needing the guard I'm deleting.
In addition to OpenShift-managed CRDs, there might be many CRDs on an OpenShift cluster that are not part of the OpenShift release payload. They may have been installed by OLM-installed operators. They may have been installed by individual cluster users running
oc apply .... Lots of other folks who could be maintaining CRDs on the cluster. We don't want to confuse admins who are considering an OpenShift version update by any issues that might be going on with those non-OpenShift CRDs.Summary by CodeRabbit
Bug Fixes
Tests