Skip to content

CNTRLPLANE-3513: add kms health reports#2881

Open
tjungblu wants to merge 2 commits into
openshift:masterfrom
tjungblu:kms_only_status
Open

CNTRLPLANE-3513: add kms health reports#2881
tjungblu wants to merge 2 commits into
openshift:masterfrom
tjungblu:kms_only_status

Conversation

@tjungblu

@tjungblu tjungblu commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This PR adds a structured API for the kms health reports, which are currently stored as json in the operator conditions.

Corresponding enhancement in openshift/enhancements#2005

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 9, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 9, 2026

Copy link
Copy Markdown

@tjungblu: This pull request references CNTRLPLANE-3513 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set.

Details

In response to this:

This PR adds a structured API for the kms health reports, which are currently stored as json in the operator conditions.

Corresponding enhancement in openshift/enhancements#2005

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds KMS plugin health types to the config API (KMSPluginHealthStatus, KMSPluginHealthReport, APIServerEncryptionStatus.healthReports) and wires an optional v1.APIServerEncryptionStatus into operator status structs (OAuthAPIServerStatus, KubeAPIServerStatus, OpenShiftAPIServerStatus) behind KMSEncryption. Introduces CRD payload manifests for KubeAPIServer, OpenShiftAPIServer, and Authentication with OpenAPI/CEL validations (including unique (nodeName,keyId) for healthReports) and adds operator tests that accept healthy/unique reports and reject duplicates.

Suggested reviewers

  • everettraven
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Microshift Test Compatibility ⚠️ Warning Tests use operator.openshift.io APIs unavailable on MicroShift and lack [apigroup:...] tags or [Skipped:MicroShift] protection labels. Add [apigroup:operator.openshift.io] tags to test names or guard with exutil.IsMicroShiftCluster() skip checks to prevent test execution on MicroShift.
✅ Passed checks (14 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding KMS health reports to the API, matching the primary objective of introducing structured API types for KMS plugin health reporting.
Description check ✅ Passed The description directly relates to the changeset by explaining the purpose of adding KMS health reports structured API (currently stored as JSON in operator conditions) and references the corresponding enhancement.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed All test names in the added YAML test files are static and deterministic. No dynamic elements like node names, timestamps, UUIDs, or generated suffixes are present in test titles.
Test Structure And Quality ✅ Passed PR adds CRD validation test YAML files, not Ginkgo tests. Tests have clear single responsibility and meaningful error messages consistent with existing test patterns.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The PR adds CRD validation test manifests (YAML files), not Ginkgo e2e tests. These are parsed by tests/generator.go to create unit tests for schema validation, not multi-node cluster tests.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds only API types and CRD schemas for KMS health reporting; no deployment manifests, controller code, or scheduling constraints are introduced.
Ote Binary Stdout Contract ✅ Passed PR adds only type definitions and YAML manifests; no process-level code or stdout writes that could corrupt OTE binary JSON communication.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds KMS health report API types and CRD validation test fixtures, not Ginkgo e2e tests. Check only applies to new e2e tests.
No-Weak-Crypto ✅ Passed PR adds only data type definitions and CRD schemas for KMS health reporting; no crypto code, weak algorithms, or insecure comparisons found.
Container-Privileges ✅ Passed PR adds KMS health reporting types and CRD schemas only; no container specs or privilege escalation settings found.
No-Sensitive-Data-In-Logs ✅ Passed PR adds API type definitions with no logging code. Fields documented as non-sensitive; Detail field contains generic error messages safe for logging.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented
The command is terminated due to an error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Hello @tjungblu! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 9, 2026
@openshift-ci openshift-ci Bot requested review from JoelSpeed and everettraven June 9, 2026 12:31
@openshift-ci

openshift-ci Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign everettraven for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
config/v1/types_apiserver.go (2)

313-324: 💤 Low value

Document maxLength constraints for optional fields.

The KEKId and Detail fields have maxLength: 1024 validation markers, but this constraint isn't documented in the field comments. Per API documentation guidelines, validation constraints should be documented.

Consider adding length constraints to the comments:

-	// kekId refers to the remote KEK id from KMS v2 StatusResponse.key_id.
-	// This is not a cryptographic key, but a unique representation of the KEK.
+	// kekId refers to the remote KEK id from KMS v2 StatusResponse.key_id.
+	// This is not a cryptographic key, but a unique representation of the KEK.
+	// The value must be at most 1024 characters when specified.
...
-	// detail contains additional error/health information; omitted when healthy
+	// detail contains additional error/health information; omitted when healthy.
+	// The value must be at most 1024 characters when specified.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/v1/types_apiserver.go` around lines 313 - 324, The comments for the
optional fields KEKId and Detail do not document the MaxLength=1024 validation;
update the field comments for KEKId and Detail to explicitly state the allowed
length (e.g., "Maximum length: 1024 characters") so the validation constraint is
visible in the API docs, leaving the existing kubebuilder tags
(MinLength/MaxLength/optional) intact for KEKId and Detail.

Source: Coding guidelines


330-338: 💤 Low value

Document maxItems constraint for healthReports.

The healthReports field has maxItems: 250 validation but this isn't documented in the comment. Consider adding this constraint to help operators understand the limit:

 // healthReports contains all KMS plugin health reports for this APIServer.
 // When omitted, no health reports are available.
 // Each entry must have a unique combination of nodeName and keyId.
+// The list is limited to 250 entries.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@config/v1/types_apiserver.go` around lines 330 - 338, The comment for the
HealthReports field does not mention the configured maximum of 250 items; update
the doc comment above the HealthReports []KMSPluginHealthReport field to
explicitly state the maxItems constraint (250) so operators see the limit (e.g.,
"Maximum of 250 entries"); keep reference to the existing uniqueness requirement
and other validation notes for HealthReports and KMSPluginHealthReport
unchanged.

Source: Coding guidelines

payload-manifests/crds/0000_50_authentication_01_authentications-DevPreviewNoUpgrade.crd.yaml (1)

242-249: 💤 Low value

Minor inconsistency: enum includes empty string but minLength rejects it.

The status field enum includes "" (empty string) but minLength: 1 prevents its use. While functionally correct (only healthy/unhealthy/error are valid), this is a documentation inconsistency. The enum value "" can never pass validation.

This is likely generated from the feature-gate-aware enum marker. Consider whether the empty string should be in the enum when the feature gate is enabled.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@payload-manifests/crds/0000_50_authentication_01_authentications-DevPreviewNoUpgrade.crd.yaml`
around lines 242 - 249, The enum for the status field currently includes an
empty string "" but minLength: 1 prevents it from ever validating; fix by either
removing the "" entry from the enum on the status field (preferred) so the enum
matches minLength: 1, or if the empty-string must be allowed for feature-gate
behavior, change minLength to 0 to permit it; update the status enum/minLength
pair accordingly and ensure any feature-gate-aware generator/config (the
feature-gate-aware enum marker) is adjusted to produce the matching combination.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@config/v1/types_apiserver.go`:
- Around line 313-324: The comments for the optional fields KEKId and Detail do
not document the MaxLength=1024 validation; update the field comments for KEKId
and Detail to explicitly state the allowed length (e.g., "Maximum length: 1024
characters") so the validation constraint is visible in the API docs, leaving
the existing kubebuilder tags (MinLength/MaxLength/optional) intact for KEKId
and Detail.
- Around line 330-338: The comment for the HealthReports field does not mention
the configured maximum of 250 items; update the doc comment above the
HealthReports []KMSPluginHealthReport field to explicitly state the maxItems
constraint (250) so operators see the limit (e.g., "Maximum of 250 entries");
keep reference to the existing uniqueness requirement and other validation notes
for HealthReports and KMSPluginHealthReport unchanged.

In
`@payload-manifests/crds/0000_50_authentication_01_authentications-DevPreviewNoUpgrade.crd.yaml`:
- Around line 242-249: The enum for the status field currently includes an empty
string "" but minLength: 1 prevents it from ever validating; fix by either
removing the "" entry from the enum on the status field (preferred) so the enum
matches minLength: 1, or if the empty-string must be allowed for feature-gate
behavior, change minLength to 0 to permit it; update the status enum/minLength
pair accordingly and ensure any feature-gate-aware generator/config (the
feature-gate-aware enum marker) is adjusted to produce the matching combination.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 98ea2a13-beee-4e07-acf0-5dedda344284

📥 Commits

Reviewing files that changed from the base of the PR and between d3390bd and 3da0bf8.

⛔ Files ignored due to path filters (24)
  • config/v1/zz_generated.deepcopy.go is excluded by !**/zz_generated*
  • config/v1/zz_generated.swagger_doc_generated.go is excluded by !**/zz_generated*
  • openapi/generated_openapi/zz_generated.openapi.go is excluded by !openapi/**, !**/zz_generated*
  • operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-Default.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-DevPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-OKD.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_30_openshift-apiserver_01_openshiftapiservers-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_30_openshift-apiserver_01_openshiftapiservers-Default.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_30_openshift-apiserver_01_openshiftapiservers-DevPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_30_openshift-apiserver_01_openshiftapiservers-OKD.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_30_openshift-apiserver_01_openshiftapiservers-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_50_authentication_01_authentications-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_50_authentication_01_authentications-Default.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_50_authentication_01_authentications-DevPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_50_authentication_01_authentications-OKD.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_50_authentication_01_authentications-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.deepcopy.go is excluded by !**/zz_generated*
  • operator/v1/zz_generated.featuregated-crd-manifests.yaml is excluded by !**/zz_generated*
  • operator/v1/zz_generated.featuregated-crd-manifests/authentications.operator.openshift.io/KMSEncryption.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • operator/v1/zz_generated.featuregated-crd-manifests/kubeapiservers.operator.openshift.io/KMSEncryption.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • operator/v1/zz_generated.featuregated-crd-manifests/openshiftapiservers.operator.openshift.io/KMSEncryption.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • operator/v1/zz_generated.swagger_doc_generated.go is excluded by !**/zz_generated*
📒 Files selected for processing (22)
  • config/v1/types_apiserver.go
  • operator/v1/tests/authentications.operator.openshift.io/KMSEncryption.yaml
  • operator/v1/tests/kubeapiservers.operator.openshift.io/KMSEncryption.yaml
  • operator/v1/tests/openshiftapiservers.operator.openshift.io/KMSEncryption.yaml
  • operator/v1/types_authentication.go
  • operator/v1/types_kubeapiserver.go
  • operator/v1/types_openshiftapiserver.go
  • payload-manifests/crds/0000_20_kube-apiserver_01_kubeapiservers-CustomNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_20_kube-apiserver_01_kubeapiservers-Default.crd.yaml
  • payload-manifests/crds/0000_20_kube-apiserver_01_kubeapiservers-DevPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_20_kube-apiserver_01_kubeapiservers-OKD.crd.yaml
  • payload-manifests/crds/0000_20_kube-apiserver_01_kubeapiservers-TechPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_30_openshift-apiserver_01_openshiftapiservers-CustomNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_30_openshift-apiserver_01_openshiftapiservers-Default.crd.yaml
  • payload-manifests/crds/0000_30_openshift-apiserver_01_openshiftapiservers-DevPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_30_openshift-apiserver_01_openshiftapiservers-OKD.crd.yaml
  • payload-manifests/crds/0000_30_openshift-apiserver_01_openshiftapiservers-TechPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_50_authentication_01_authentications-CustomNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_50_authentication_01_authentications-Default.crd.yaml
  • payload-manifests/crds/0000_50_authentication_01_authentications-DevPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_50_authentication_01_authentications-OKD.crd.yaml
  • payload-manifests/crds/0000_50_authentication_01_authentications-TechPreviewNoUpgrade.crd.yaml

This PR adds a structured API for the kms health reports, which are
currently stored as json in the operator conditions.

Corresponding enhancement in openshift/enhancements#2005

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the changes here are only used in the operator/v1 group - any reason these changes shouldn't be placed in that group instead of this one?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is semantically correlated with the KMS and encryption configurations above.

Comment thread config/v1/types_apiserver.go Outdated
Comment on lines +270 to +279
// +openshift:validation:FeatureGateAwareEnum:featureGate=KMSEncryption,enum="";healthy;unhealthy;error
type KMSPluginHealthStatus string

const (
KMSPluginHealthStatusHealthy KMSPluginHealthStatus = "healthy"

KMSPluginHealthStatusUnhealthy KMSPluginHealthStatus = "unhealthy"

KMSPluginHealthStatusError KMSPluginHealthStatus = "error"
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enum values should be PascalCase:

Suggested change
// +openshift:validation:FeatureGateAwareEnum:featureGate=KMSEncryption,enum="";healthy;unhealthy;error
type KMSPluginHealthStatus string
const (
KMSPluginHealthStatusHealthy KMSPluginHealthStatus = "healthy"
KMSPluginHealthStatusUnhealthy KMSPluginHealthStatus = "unhealthy"
KMSPluginHealthStatusError KMSPluginHealthStatus = "error"
)
// +openshift:validation:FeatureGateAwareEnum:featureGate=KMSEncryption,enum="";Healthy;Unhealthy;Error
type KMSPluginHealthStatus string
const (
KMSPluginHealthStatusHealthy KMSPluginHealthStatus = "Healthy"
KMSPluginHealthStatusUnhealthy KMSPluginHealthStatus = "Unhealthy"
KMSPluginHealthStatusError KMSPluginHealthStatus = "Error"
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied

Comment on lines +300 to +324
// status contains a health indicator for the respective KMS plugin
// The field can have three states: healthy, unhealthy, error.
// With error and unhealthy containing additional information in Detail.
//
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=10
// +required
Status KMSPluginHealthStatus `json:"status,omitempty"`

// lastChecked is a timestamp of when the probe was last checked.
// +required
LastChecked metav1.Time `json:"lastChecked,omitempty"`

// kekId refers to the remote KEK id from KMS v2 StatusResponse.key_id.
// This is not a cryptographic key, but a unique representation of the KEK.
// +kubebuilder:validation:MinLength=0
// +kubebuilder:validation:MaxLength=1024
// +optional
KEKId *string `json:"kekId,omitempty"`

// detail contains additional error/health information; omitted when healthy
// +kubebuilder:validation:MinLength=0
// +kubebuilder:validation:MaxLength=1024
// +optional
Detail *string `json:"detail,omitempty"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from kekId, I think the fields here could be replaced by using typical status conditions (i.e a []metav1.Condition).

For example, I think this could probably all be wrapped into a single condition.

Happy path:

type: Healthy
status: True
reason: AsExpected
lastTransitionTime: ....

Unhealthy scenario:

type: Healthy
status: False
reason: SomethingIsWrong
message: "error: something is wrong"
lastTransitionTime: ....

While lastTransitionTime isn't necessarily as accurate as lastChecked in terms of probe state, it still signals that the status has been observed since X time and prevents unnecessary status updates to update a timestamp.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibihim does that work for you to replace it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would need NodeName,KeyId,KekId however - so the condition would replace only the health status, lastchecked and detail

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do use conditions for this today and learned that it's difficult to work with. Operators end up parsing JSON field with no schema behind it. Please see for more details about the structure.

It seems that a dedicated type fits better:

  1. a message isn't meant to carry structured data. It's a human readable string by convention.

  2. there's no schema (or rather schema is implicit), no validation, and no generated client.

  3. structured types are easier to understand and work with. With a dedicated type, someone reading the API or writing a controller can immediately see what fields exist and what their types are.

  4. schema evolution seems to be simpler. if we later need to add a field a dedicated type gets a new field with proper validation markers and generated client support.

@p0lyn0mial p0lyn0mial Jun 10, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am pasting the conditions we use today from the kep i shared in the previous message:

status:
  conditions:
    - type: KMSHealthReporter_master-0
      status: "True"
      reason: AsExpected
      message: '[{"kekID":"kek-9f2c","keyID":"2","status":"healthy","lastChecked":"2026-05-08T12:34:56Z"},{"kekID":"kek-4a17","keyID":"1","status":"healthy","lastChecked":"2026-05-08T12:34:56Z"}]'
    - type: KMSHealthReporter_master-1
      status: "True"
      reason: AsExpected
      message: '[{"kekID":"kek-9f2c","keyID":"2","status":"unhealthy","lastChecked":"2026-05-08T12:34:56Z","detail":"credential lacks decrypt permission"},{"kekID":"kek-4a17","keyID":"1","status":"healthy","lastChecked":"2026-05-08T12:34:56Z"}]'
    - type: KMSHealthReporter_master-2
      status: "True"
      reason: AsExpected
      message: '[{"keyID":"2","status":"error","lastChecked":"2026-05-08T12:34:56Z","detail":"connection refused"},{"keyID":"1","status":"error","lastChecked":"2026-05-08T12:34:56Z","detail":"connection refused"}]'

// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=512
// +required
KeyId string `json:"keyId,omitempty"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the key id follow a specific format that we can validate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't make many assumptions here, this is a concatenation of some strings and integers.

// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:MaxLength=512
// +required
NodeName string `json:"nodeName,omitempty"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node names follow the DNS subdomain name format. We should use the format validation CEL library in kube for validating the input against that naming format: https://kubernetes.io/docs/reference/using-api/cel/#kubernetes-format-library

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid this will clash with the way that the node controller reports those, I'll add the CEL though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, are you sure that this is the case?
the node metadata has no such constraints:
https://github.com/kubernetes/api/blob/master/core/v1/types.go#L7013-L7030

// +kubebuilder:validation:MinLength=0
// +kubebuilder:validation:MaxLength=1024
// +optional
KEKId *string `json:"kekId,omitempty"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean for kekId to be set to ""?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means there is a connection error between health monitor and kms plugin. So ideally we should expect that KMSPluginHealthStatus should be error (or unhealthy)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibihim how do you intent do deal with this? are you just keeping the last value of a KEK in the conditions or are you always wiping it on error?

Otherwise we can also make this required from the start.

// +kubebuilder:validation:MinItems=0
// +kubebuilder:validation:MaxItems=100
// +openshift:validation:FeatureGateAwareXValidation:featureGate=KMSEncryption,rule="self.all(x, self.exists_one(y, y.nodeName == x.nodeName && y.keyId == x.keyId))",message="each health report must have a unique nodeName/keyId combination"
// +listType=atomic

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for this to be an atomic list?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what else should it be?

// +kubebuilder:validation:MaxItems=100
// +openshift:validation:FeatureGateAwareXValidation:featureGate=KMSEncryption,rule="self.all(x, self.exists_one(y, y.nodeName == x.nodeName && y.keyId == x.keyId))",message="each health report must have a unique nodeName/keyId combination"
// +listType=atomic
HealthReports []KMSPluginHealthReport `json:"healthReports,omitempty"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a semantic difference between healthReports: [] and it being omitted?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope

Comment thread config/v1/types_apiserver.go Outdated
// +optional
// +kubebuilder:validation:MinItems=0
// +kubebuilder:validation:MaxItems=100
// +openshift:validation:FeatureGateAwareXValidation:featureGate=KMSEncryption,rule="self.all(x, self.exists_one(y, y.nodeName == x.nodeName && y.keyId == x.keyId))",message="each health report must have a unique nodeName/keyId combination"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't need to be feature gated if it this type is only used by fields that are gated.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment thread config/v1/types_apiserver.go Outdated
EncryptionTypeKMS EncryptionType = "KMS"
)

// +openshift:validation:FeatureGateAwareEnum:featureGate=KMSEncryption,enum="";healthy;unhealthy;error

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to be a feature gated enum, can just be a regular enum marker.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied

@ardaguclu

Copy link
Copy Markdown
Member

From KMS feature team POV, fields look good to me and align with what we discussed.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
payload-manifests/crds/0000_20_kube-apiserver_01_kubeapiservers-DevPreviewNoUpgrade.crd.yaml (1)

181-239: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Documented uniqueness constraint not enforced across all CRD variants.

All four CRD files contain healthReports arrays with descriptions stating "Each entry must have a unique combination of nodeName and keyId", but none include a CEL validation rule to enforce this. The x-kubernetes-list-type: atomic setting does not provide uniqueness guarantees.

This allows duplicate (nodeName, keyId) entries to be persisted, violating the documented API contract. The fix should be applied to the Go type definitions in config/v1/types_apiserver.go (and regenerated), or directly via a CEL validation:

x-kubernetes-validations:
- message: "each healthReport must have a unique combination of nodeName and keyId"
  rule: "self.all(x, self.exists_one(y, x.nodeName == y.nodeName && x.keyId == y.keyId))"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@payload-manifests/crds/0000_20_kube-apiserver_01_kubeapiservers-DevPreviewNoUpgrade.crd.yaml`
around lines 181 - 239, The CRD's healthReports array documents uniqueness of
(nodeName, keyId) but doesn't enforce it; add a CEL validation to the
healthReports schema (or add corresponding validation tags in
config/v1/types_apiserver.go and regenerate) that enforces "each entry must have
a unique combination of nodeName and keyId" using a rule that checks every
element has no other element with the same nodeName and keyId (attach a clear
message like "each healthReport must have a unique combination of nodeName and
keyId"); locate the healthReports array definition in the CRD (and the
HealthReport Go type in types_apiserver.go) and add the
x-kubernetes-validations/CEL rule to that schema so duplicates cannot be
persisted.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In
`@payload-manifests/crds/0000_20_kube-apiserver_01_kubeapiservers-DevPreviewNoUpgrade.crd.yaml`:
- Around line 181-239: The CRD's healthReports array documents uniqueness of
(nodeName, keyId) but doesn't enforce it; add a CEL validation to the
healthReports schema (or add corresponding validation tags in
config/v1/types_apiserver.go and regenerate) that enforces "each entry must have
a unique combination of nodeName and keyId" using a rule that checks every
element has no other element with the same nodeName and keyId (attach a clear
message like "each healthReport must have a unique combination of nodeName and
keyId"); locate the healthReports array definition in the CRD (and the
HealthReport Go type in types_apiserver.go) and add the
x-kubernetes-validations/CEL rule to that schema so duplicates cannot be
persisted.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: aa05b620-e7e1-4823-bc9d-5a2d15029c03

📥 Commits

Reviewing files that changed from the base of the PR and between 199067b and e632670.

⛔ Files ignored due to path filters (14)
  • config/v1/zz_generated.swagger_doc_generated.go is excluded by !**/zz_generated*
  • openapi/generated_openapi/zz_generated.openapi.go is excluded by !openapi/**, !**/zz_generated*
  • operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-DevPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_20_kube-apiserver_01_kubeapiservers-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_30_openshift-apiserver_01_openshiftapiservers-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_30_openshift-apiserver_01_openshiftapiservers-DevPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_30_openshift-apiserver_01_openshiftapiservers-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_50_authentication_01_authentications-CustomNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_50_authentication_01_authentications-DevPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.crd-manifests/0000_50_authentication_01_authentications-TechPreviewNoUpgrade.crd.yaml is excluded by !**/zz_generated.crd-manifests/*
  • operator/v1/zz_generated.featuregated-crd-manifests/authentications.operator.openshift.io/KMSEncryption.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • operator/v1/zz_generated.featuregated-crd-manifests/kubeapiservers.operator.openshift.io/KMSEncryption.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
  • operator/v1/zz_generated.featuregated-crd-manifests/openshiftapiservers.operator.openshift.io/KMSEncryption.yaml is excluded by !**/zz_generated.featuregated-crd-manifests/**
📒 Files selected for processing (10)
  • config/v1/types_apiserver.go
  • payload-manifests/crds/0000_20_kube-apiserver_01_kubeapiservers-CustomNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_20_kube-apiserver_01_kubeapiservers-DevPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_20_kube-apiserver_01_kubeapiservers-TechPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_30_openshift-apiserver_01_openshiftapiservers-CustomNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_30_openshift-apiserver_01_openshiftapiservers-DevPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_30_openshift-apiserver_01_openshiftapiservers-TechPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_50_authentication_01_authentications-CustomNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_50_authentication_01_authentications-DevPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_50_authentication_01_authentications-TechPreviewNoUpgrade.crd.yaml
✅ Files skipped from review due to trivial changes (1)
  • payload-manifests/crds/0000_20_kube-apiserver_01_kubeapiservers-CustomNoUpgrade.crd.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
  • config/v1/types_apiserver.go
  • payload-manifests/crds/0000_50_authentication_01_authentications-TechPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_30_openshift-apiserver_01_openshiftapiservers-CustomNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_20_kube-apiserver_01_kubeapiservers-TechPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_50_authentication_01_authentications-CustomNoUpgrade.crd.yaml

@openshift-ci

openshift-ci Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

@tjungblu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/integration e632670 link true /test integration

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants