Skip to content

OTA-1966: pkg/readiness/crd_compat: Drop unnecessary CustomResourceDefinition check#1415

Open
wking wants to merge 1 commit into
openshift:mainfrom
wking:drop-custom-resource-definition-compat-readiness-check
Open

OTA-1966: pkg/readiness/crd_compat: Drop unnecessary CustomResourceDefinition check#1415
wking wants to merge 1 commit into
openshift:mainfrom
wking:drop-custom-resource-definition-compat-readiness-check

Conversation

@wking

@wking wking commented Jun 26, 2026

Copy link
Copy Markdown
Member

We grew this check recently in b0e4c90 (#1395). But I don't think we need it. I agree with the guard, having a CRD with storage versions that were not served would be weird, and probably not what the CRD maintainers intended. But what would a cluster-admin be expected to do about it? There's nothing in the code I'm removing here, or in
https://github.com/openshift/agentic-skills as far as I can see, that turns stored version no longer served into an actionable plan.

In the CRD-evolution space, OpenShift packages the Kube storage-version migrator. But we don't run the trigger logic. And, as far as I can tell, the migration operator isn't checking to see if any StorageVersionMigrations are completed. I expect that, if OpenShift ever decides to remove support for a previously-stored version, we'll set up something to run (and require the completion of) a StorageVersionMigration to ensure storage is off the outgoing version before we leave the last release where it was served. So that would protect OpenShift users from OpenShift-managed CRD evolution, without needing the guard I'm deleting.

In addition to OpenShift-managed CRDs, there might be many CRDs on an OpenShift cluster that are not part of the OpenShift release payload. They may have been installed by OLM-installed operators. They may have been installed by individual cluster users running oc apply .... Lots of other folks who could be maintaining CRDs on the cluster. We don't want to confuse admins who are considering an OpenShift version update by any issues that might be going on with those non-OpenShift CRDs.

Summary by CodeRabbit

  • Bug Fixes

    • Updated readiness reporting to omit CRD compatibility details, reducing the total number of checks reported.
    • Adjusted readiness output expectations to match the reduced check set.
  • Tests

    • Updated readiness test fixtures and assertions to remove CRD compatibility scenarios.
    • Revised expected total and successful check counts from 9 to 8 and removed related check-name/verifications.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 26, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@wking: This pull request references OTA-1966 which is a valid jira issue.

Details

In response to this:

We grew this check recently in b0e4c90 (#1395). But I don't think we need it. I agree with the guard, having a CRD with storage versions that were not served would be weird, and probably not what the CRD maintainers intended. But what would a cluster-admin be expected to do about it? There's nothing in the code I'm removing here, or in
https://github.com/openshift/agentic-skills as far as I can see, that turns stored version no longer served into an actionable plan.

In the CRD-evolution space, OpenShift packages the Kube storage-version migrator. But we don't run the trigger logic. And, as far as I can tell, the migration operator isn't checking to see if any StorageVersionMigrations are completed. I expect that, if OpenShift ever decides to remove support for a previously-stored version, we'll set up something to run (and require the completion of) a StorageVersionMigration to ensure storage is off the outgoing version before we leave the last release where it was served. So that would protect OpenShift users from OpenShift-managed CRD evolution, without needing the guard I'm deleting.

In addition to OpenShift-managed CRDs, there might be many CRDs on an OpenShift cluster that are not part of the OpenShift release payload. They may have been installed by OLM-installed operators. They may have been installed by individual cluster users running oc apply .... Lots of other folks who could be maintaining CRDs on the cluster. We don't want to confuse admins who are considering an OpenShift version update by any issues that might be going on with those non-OpenShift CRDs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: b481ba99-27c7-426f-961e-784fb8e786f0

📥 Commits

Reviewing files that changed from the base of the PR and between f1868ad and 2377aaa.

📒 Files selected for processing (7)
  • pkg/proposal/controller_test.go
  • pkg/readiness/check.go
  • pkg/readiness/check_test.go
  • pkg/readiness/checks_test.go
  • pkg/readiness/client.go
  • pkg/readiness/crd_compat.go
  • test/cvo/readiness.go
💤 Files with no reviewable changes (3)
  • pkg/readiness/check.go
  • pkg/readiness/crd_compat.go
  • pkg/readiness/client.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • test/cvo/readiness.go
  • pkg/readiness/check_test.go
  • pkg/proposal/controller_test.go
  • pkg/readiness/checks_test.go

Walkthrough

The PR removes the CRD compatibility readiness check from the readiness registry and updates tests and fixtures to expect one fewer readiness check and no crd_compat output.

Changes

Readiness compatibility check removal

Layer / File(s) Summary
Registry and check removal
pkg/readiness/check.go, pkg/readiness/client.go, pkg/readiness/crd_compat.go
AllChecks no longer registers CRDCompatCheck, and the CRD GroupVersionResource definition is removed.
Readiness test updates
pkg/readiness/check_test.go, pkg/readiness/checks_test.go
AllChecks() expectations now use 8 checks, CRD compatibility test cases are removed, fake CRD fixtures are dropped, and RunAll assertions no longer expect crd_compat.
Consumer readiness expectations
pkg/proposal/controller_test.go, test/cvo/readiness.go
The proposal and CVO readiness tests stop setting up CRD compatibility data and update total-check expectations to 8.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • jhadvig
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: removing the CRD compatibility readiness check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Changed Ginkgo titles are static literals; no fmt.Sprintf, concatenation, generated suffixes, or dynamic names appear in the affected test files.
Test Structure And Quality ✅ Passed PASS: The touched Ginkgo test uses BeforeEach/DeferCleanup, a 2m context timeout, no cluster resource creation or indefinite waits, and follows repo test patterns.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; the only e2e file change is an assertion update in an existing test suite.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The PR only removes CRD-compat readiness coverage and updates an existing Ginkgo assertion to expect 8 checks; no new SNO-sensitive multi-node assumptions were added.
Topology-Aware Scheduling Compatibility ✅ Passed PR only removes a readiness check and updates tests; no manifests/controllers with affinity, node selectors, replicas, or PDB scheduling changes.
Ote Binary Stdout Contract ✅ Passed Touched files only adjust tests/readiness counts; the sole init() is scheme setup and no touched process-level code writes to stdout.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests were added; the existing test/cvo/readiness.go change only updates check counts and uses cluster APIs, with no IPv4-only or external-network assumptions.
No-Weak-Crypto ✅ Passed Touched files only remove CRD readiness code/tests; no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret/token comparisons appear.
Container-Privileges ✅ Passed PR diff only touches Go code/tests and removes readiness logic; no privileged/hostPID/hostNetwork/hostIPC/SYS_ADMIN/allowPrivilegeEscalation settings appear.
No-Sensitive-Data-In-Logs ✅ Passed Touched files only remove CRD readiness code/tests; no log statements or sensitive-data logging were added in the modified paths.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/proposal/controller_test.go`:
- Around line 1359-1364: The aggregate readiness expectations in
controller_test.go are still asserting 9 checks even though the updated check
list in the readiness loop no longer includes crd_compat and RunAll() now
reports 8 total checks. Update the assertions around meta["total_checks"] and
meta["checks_ok"] to match the new 8-check aggregate, and verify the related
readiness metadata expectations stay consistent with the check names used in the
loop and RunAll().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 6d648dc8-afba-438c-8ecb-27c1a63781a5

📥 Commits

Reviewing files that changed from the base of the PR and between dd0a841 and f1868ad.

📒 Files selected for processing (7)
  • pkg/proposal/controller_test.go
  • pkg/readiness/check.go
  • pkg/readiness/check_test.go
  • pkg/readiness/checks_test.go
  • pkg/readiness/client.go
  • pkg/readiness/crd_compat.go
  • test/cvo/readiness.go
💤 Files with no reviewable changes (3)
  • pkg/readiness/client.go
  • pkg/readiness/crd_compat.go
  • pkg/readiness/check.go

Comment thread pkg/proposal/controller_test.go

@harche harche left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some context on why this existed since I was involved in adding it. The check came in with the readiness set in #1395. The intent was to catch CRDs where a stored version is no longer served, because that state can leave objects sitting in etcd that nothing can decode anymore, and the thought was to surface that to the admin before an upgrade so they could make an informed go or no-go call.

Having sat with it, I think it should go, but not because the signal is useless. The problem is it answers the wrong question. It scans the cluster as it is right now and flags CRDs that are already in the stranded state. Those objects are already unreadable today, with or without the upgrade, so it isn't really telling you whether this upgrade is safe, it's telling you something on your cluster is already broken and unrelated to the upgrade. It also never read current or target, it ignored both, which is another way of seeing that it was a present state hygiene scan rather than an upgrade readiness check.

The genuinely upgrade relevant question is the forward looking one. Will this upgrade strand any objects, which happens when the target release stops serving a version you still have objects stored in. Answering that means diffing the target release's CRD served versions against the cluster's current storedVersions. The deleted check did none of that. So if we want this signal back, that target aware version is the one worth building, and it would actually support the before hand decision this one was meant to.

One thing to fix first. This breaks TestGetProposals_WithReadinessData. Most of the 9 to 8 updates landed but the two count assertions in controller_test.go still expect 9, so go test ./pkg/proposal/... fails.

Comment thread pkg/proposal/controller_test.go
…heck

We grew this check recently in b0e4c90 (pkg/readiness: Add
readiness checks and wire into proposal controller, 2026-05-27, openshift#1395).
But I don't think we need it.  I agree with the guard, that having a
CRD with storage versions that were not served would be weird, and
probably not what the CRD maintainers intended.  But what would a
cluster-admin be expected to do about it?  There's nothing in the code
I'm removing here, or in https://github.com/openshift/agentic-skills
as far as I can see, that turns "stored version no longer served" into
an actionable plan.

In the CRD-evolution space, OpenShift packages the Kube
storage-version migrator.  But we don't run the trigger logic [1].
And, as far as I can tell, the migration operator isn't checking to
see if any StorageVersionMigrations are completed [2].  I expect that,
if OpenShift ever decides to remove support for a previously-stored
version, we'll set up something to run (and require the completion of)
a StorageVersionMigration to ensure storage is off the outgoing
version before we leave the last release where it was served.  So that
would protect OpenShift users from OpenShift-managed CRD evolution,
without needing the guard I'm deleting

In addition to OpenShift-managed CRDs, there might be many CRDs on an
OpenShift cluster that are not part of the OpenShift release payload.
They may have been installed by OLM-installed operators.  They may
have been installed by individual cluster users running 'oc apply
...'.  Lots of other folks who could be maintaining CRDs on the
cluster.  We don't want to confuse admins who are considering an
OpenShift version update by any issues that might be going on with
those non-OpenShift CRDs.

[1]: openshift/cluster-kube-storage-version-migrator-operator@807c909#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R7
[2]: https://github.com/kubernetes-sigs/kube-storage-version-migrator/blob/master/USER_GUIDE.md#check-if-migration-has-completed
@wking wking force-pushed the drop-custom-resource-definition-compat-readiness-check branch from f1868ad to 2377aaa Compare June 29, 2026 22:47
@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 30, 2026
@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: harche, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wking

wking commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

blocks manual creation of Endpoints pointing to the cluster or service network 403 and Netpol NetworkPolicy between server and client should enforce policy to allow ingress traffic from pods in all namespaces timeout are unrelated:

/override ci/prow/e2e-aws-ovn-techpreview
/override ci/prow/e2e-hypershift-conformance

No consumers yet, without having OLS installed, and the unit-test coverage of our Proposal handling is enough to make me comfortable that's safe enough.

/verified by presubmits passing

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jul 1, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@wking: This PR has been marked as verified by presubmits passing.

Details

In response to this:

blocks manual creation of Endpoints pointing to the cluster or service network 403 and Netpol NetworkPolicy between server and client should enforce policy to allow ingress traffic from pods in all namespaces timeout are unrelated:

/override ci/prow/e2e-aws-ovn-techpreview
/override ci/prow/e2e-hypershift-conformance

No consumers yet, without having OLS installed, and the e2e coverage of our Proposal handling is enough to make me comfortable that's safe enough.

/verified by presubmits passing

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-aws-ovn-techpreview, ci/prow/e2e-hypershift-conformance

Details

In response to this:

blocks manual creation of Endpoints pointing to the cluster or service network 403 and Netpol NetworkPolicy between server and client should enforce policy to allow ingress traffic from pods in all namespaces timeout are unrelated:

/override ci/prow/e2e-aws-ovn-techpreview
/override ci/prow/e2e-hypershift-conformance

No consumers yet, without having OLS installed, and the e2e coverage of our Proposal handling is enough to make me comfortable that's safe enough.

/verified by presubmits passing

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci

openshift-ci Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants