HYPERFLEET-1185 - feat: add performance tests and tooling by pnguyen44 · Pull Request #127 · openshift-hyperfleet/hyperfleet-e2e

pnguyen44 · 2026-06-16T15:27:50Z

Summary

Adds lightweight performance tests that measure baseline latencies for core HyperFleet operations (cluster and nodepool CRUD). Tests are labeled perf and emit [PERF] log lines with measured durations, which can be parsed into a baseline report. Includes tooling to seed the database, run tests in-cluster, and generate summary reports.

Note: This PR covers the test implementation and tooling only. Establishing official baselines and adding threshold assertions based on those baselines will follow in a separate PR.

Changes

Added perf tests covering cluster create, read, list, filtered-list, update, delete, cascade-delete latencies, read-by-entity-size, and nodepool create/delete latencies
Added ListClustersWithParams client method for parameterized cluster queries (search, pagination)
Added perf tooling: seed-clusters.sh for database seeding, run-in-cluster.sh for in-cluster execution, and parse-report.sh for extracting baseline reports
Added small and large cluster payload templates for entity-size comparison tests

Test plan

Run perf tests in-cluster via ./perf/run-in-cluster.sh and confirm [PERF] latency lines appear in output
Run ./perf/parse-report.sh against the output and verify the baseline report is generated correctly
Run ./perf/seed-clusters.sh status and ./perf/seed-clusters.sh cleanup to verify seed tooling works
make lint passes
E2E tests passed (if cross-component or major changes)

tooling

openshift-ci · 2026-06-16T15:28:01Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sherine-k for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-06-16T15:28:23Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: d447fec1-579f-4437-8f0c-a365b8e54819

📥 Commits

Reviewing files that changed from the base of the PR and between 1b8b8a5 and 7bb0de2.

📒 Files selected for processing (7)

configs/config.yaml
e2e/nodepool/perf_delete_latency.go
perf/parse-report.sh
perf/run-in-cluster.sh
perf/seed-clusters.sh
pkg/config/config.go
pkg/config/defaults.go

🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

openshift-hyperfleet/architecture (manual)
openshift-hyperfleet/hyperfleet-api (manual)
openshift-hyperfleet/hyperfleet-sentinel (manual)
openshift-hyperfleet/hyperfleet-adapter (manual)
openshift-hyperfleet/hyperfleet-broker (manual)

✅ Files skipped from review due to trivial changes (1)

pkg/config/defaults.go

🚧 Files skipped from review as they are similar to previous changes (3)

perf/run-in-cluster.sh
e2e/nodepool/perf_delete_latency.go
perf/parse-report.sh

📝 Walkthrough

Summary by CodeRabbit

Release Notes

New Features
- Added comprehensive performance testing suite measuring baseline latencies for cluster and nodepool operations (create, read, update, delete, list)
- Included in-cluster testing tools with automated database seeding and result reporting
Documentation
- New performance testing guide with setup prerequisites and step-by-step usage instructions

Walkthrough

Adds in-cluster performance baseline measurement for core HyperFleet operations. Nine Ginkgo suites labeled [Suite: cluster|nodepool][perf] time create-to-reconciled, read, read-by-entity-size, update-to-reconciled, list, filtered-list, delete-to-hard-delete, and cascade-delete latencies via polling and HTTP status checks. Each suite provisions test resources via BeforeEach, measures elapsed time for the targeted operation, logs [PERF] metrics, and defers cleanup. Supporting infrastructure: perf/run-in-cluster.sh builds container image via make image-dev, extracts reference, spawns transient pod in target namespace; perf/seed-clusters.sh provisions batch clusters via POST /clusters with perf-seed labels and implements targeted/full reset; perf/parse-report.sh extracts [PERF] and [FAIL] lines from timestamped results, embeds kubectl context metadata, and writes report. NodePoolTimeouts struct gains Deleted field with 2m default. ListClusters refactored to delegate to ListClustersWithParams. Two cluster JSON fixtures (small 31 lines, large 53 lines) added. Documentation extended.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
No Pii Or Sensitive Data In Logs	⚠️ Warning	Internal hostnames logged to stdout: perf/seed-clusters.sh lines 138, 175 log $API_URL (internal service FQDN); line 139 logs kubectl context name; perf/run-in-cluster.sh line 65 passes internal ho...	Remove or redact API_URL and kubectl context from log output; pass api-url as environment variable or config file instead of command argument.

✅ Passed checks (10 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly identifies the main change: addition of performance tests and associated tooling, which aligns with the bulk of the changeset (9 new perf test files, 3 shell scripts, documentation, and supporting infrastructure).
Description check	✅ Passed	Description is directly related to the changeset, detailing specific performance test coverage (cluster CRUD, nodepool CRUD, entity-size comparisons), tooling components (seed, run, parse scripts), and scope boundaries.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Sec-02: Secrets In Log Output	✅ Passed	No log statements found logging tokens, passwords, credentials, or secrets. Logging includes only non-sensitive config metadata: cluster IDs, names, paths, image references (with QUAY_USER as usern...
No Hardcoded Secrets	✅ Passed	.env.example contains empty environment variables with no default values. Test data files use standard placeholder values (example.com, my-gcp-project). No hardcoded secrets, credentials, API keys,...
No Weak Cryptography	✅ Passed	No banned crypto packages (MD5, DES, RC4, SHA1), no unsafe secret comparisons, no custom crypto implementations, no ECB mode, no hardcoded credentials. Environment variables used properly.
No Injection Vectors	✅ Passed	No injection vectors found. Regex validates API IDs in perf/seed-clusters.sh (^[a-zA-Z0-9_-]+) before use. Go code uses openapi.GetClustersParams for parameterized queries. Shell variables properly...
No Privileged Containers	✅ Passed	No privileged container settings found: Dockerfiles lack privileged: true, hostPID, hostNetwork, hostIPC, SYS_ADMIN capabilities, or allowPrivilegeEscalation flags. E2E test containers run as root...

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@e2e/cluster/perf_create_latency.go`:
- Around line 34-38: The ginkgo.DeferCleanup closures in multiple perf test
files are currently only logging warnings when CleanupTestCluster fails, which
allows resource leaks. Modify the closure to propagate the error by returning it
from h.CleanupTestCluster instead of only logging and continuing. Apply the same
fix to all four affected files: In e2e/cluster/perf_create_latency.go (lines
34-38), change the DeferCleanup closure to return the error from
h.CleanupTestCluster. In e2e/cluster/perf_read_latency.go (lines 27-31), make
the same change to return the cleanup error. In
e2e/cluster/perf_update_latency.go (lines 29-33), make the same change to return
the cleanup error. In e2e/cluster/perf_list_latency.go (lines 27-31), make the
same change to return the cleanup error. This ensures failures during cleanup
are properly propagated rather than silently logged.

In `@e2e/cluster/perf_read_entity_size_latency.go`:
- Around line 39-43: Replace the warning-only logging pattern in the cleanup
callbacks across all affected files with explicit test failure handling. In each
ginkgo.DeferCleanup callback (CleanupTestCluster and CleanupTestNodepool calls),
use ginkgo.Fail() to surface cleanup errors as test failures instead of
downgrading them to warnings via ginkgo.GinkgoWriter.Printf. For the
perf_delete_latency.go files (both cluster and nodepool variants), implement
idempotency checks to distinguish between already-deleted resources (which
should not fail the test) and unexpected errors (which should fail via
ginkgo.Fail()). Apply this pattern consistently across
e2e/cluster/perf_read_entity_size_latency.go (39-43),
e2e/cluster/perf_list_filtered_latency.go (28-32),
e2e/cluster/perf_delete_latency.go (30-34),
e2e/cluster/perf_cascade_delete_latency.go (31-35 and 46-50),
e2e/nodepool/perf_create_latency.go (29-33 and 44-48), and
e2e/nodepool/perf_delete_latency.go (31-35 and 41-45).

In `@e2e/nodepool/perf_delete_latency.go`:
- Around line 59-60: The Eventually assertion in the nodepool hard-delete wait
is using the incorrect timeout configuration. Replace
h.Cfg.Timeouts.Adapter.Processing with h.Cfg.Timeouts.NodePool.Reconciled in the
Eventually call that checks for the http.StatusNotFound response. This ensures
the timeout budget matches what is used elsewhere in the codebase for nodepool
hard-delete operations, preventing false failures due to control-plane lag.

In `@perf/parse-report.sh`:
- Around line 15-27: When the user explicitly provides a file path as the first
argument to the script, add validation to fail immediately if that file does not
exist, rather than silently falling back to the latest results file. The current
check on line 15 only sets INPUT and SOURCE when the file exists, but should
instead validate that if an explicit path is provided (non-empty $1), the script
must verify the file exists and exit with an error if it doesn't. This prevents
accidental parsing of the wrong artifact when a user makes a typo in the
filename they explicitly provided.

In `@perf/run-in-cluster.sh`:
- Around line 44-63: The script uses a mutable image tag (DEV_TAG variable
constructed on line 44) in the IMAGE variable on line 45 and passes it to
kubectl run with imagePullPolicy=Always on line 56, which creates a security
vulnerability since tags can be overwritten. Instead, capture the immutable
image digest from the Makefile's image-dev target (which builds and pushes the
image) and modify the run-in-cluster.sh script to use the digest reference
(format: quay.io/$QUAY_USER/hyperfleet-e2e@sha256:...) instead of the tag-based
IMAGE variable in the kubectl run command.

In `@perf/seed-clusters.sh`:
- Around line 72-77: The curl DELETE request on line 75 does not validate the
HTTP response status before incrementing the deleted counter on line 76. This
causes failed deletes (4xx/5xx responses) to be incorrectly counted as
successes. Check the HTTP status code returned by the curl command (or its exit
status) and only increment the deleted counter when the delete operation
succeeds. Ensure that failures are either logged or handled appropriately so
that the final count accurately reflects only successful cluster deletions.
- Line 15: Replace the unsafe `source` command that executes the entire `.env`
file with a whitelisting approach that parses only specific, known variables
from the `.env` file. Instead of using `source "$REPO_DIR/.env"`, extract and
assign individual variables explicitly (for example, using grep or read commands
to parse specific key-value pairs). This prevents arbitrary shell code execution
while still accessing necessary configuration values. Apply this pattern
consistently across all perf scripts that currently use `source` to load `.env`
files.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 58ad3cde-a3d9-4582-aaad-658b4d5735b4

📥 Commits

Reviewing files that changed from the base of the PR and between 0cd32f2 and 1b8b8a5.

📒 Files selected for processing (21)

.env.example
.gitignore
AGENTS.md
README.md
e2e/cluster/perf_cascade_delete_latency.go
e2e/cluster/perf_create_latency.go
e2e/cluster/perf_delete_latency.go
e2e/cluster/perf_list_filtered_latency.go
e2e/cluster/perf_list_latency.go
e2e/cluster/perf_read_entity_size_latency.go
e2e/cluster/perf_read_latency.go
e2e/cluster/perf_update_latency.go
e2e/nodepool/perf_create_latency.go
e2e/nodepool/perf_delete_latency.go
perf/README.md
perf/parse-report.sh
perf/run-in-cluster.sh
perf/seed-clusters.sh
pkg/client/cluster.go
testdata/payloads/clusters/cluster-request-large.json
testdata/payloads/clusters/cluster-request-small.json

🔗 Linked repositories identified

CodeRabbit considers these linked repositories for cross-repo context during reviews:

openshift-hyperfleet/architecture (manual)
openshift-hyperfleet/hyperfleet-api (manual)
openshift-hyperfleet/hyperfleet-sentinel (manual)
openshift-hyperfleet/hyperfleet-adapter (manual)
openshift-hyperfleet/hyperfleet-broker (manual)

pnguyen44 · 2026-06-16T16:56:00Z

/retest-required

tirthct · 2026-06-16T18:54:35Z

+Confirm the adapters are subscribed to the correct topic (not `placeholder`):
+
+```bash
+kubectl exec -n hyperfleet deploy/adapter1-hyperfleet-adapter -- env | grep BROKER_TOPIC


The verification step tells users to check BROKER_TOPIC env var:

kubectl exec -n hyperfleet deploy/adapter1-hyperfleet-adapter -- env | grep BROKER_TOPIC

But from the v1.0.0 breaking-changes checklist (item #15), the Sentinel's BROKER_TOPIC env var was removed in v1.0.0. These perf tests target v1.0.0. This verification command may return nothing, leading users to think the adapter isn't connected when it actually is — the topic is now configured via config file, not env var.

Needs verification: does the adapter (not Sentinel) deployment template still inject BROKER_TOPIC? If not, this step needs updating.

tirthct · 2026-06-16T19:00:21Z

+		for _, size := range sizes {
+			ginkgo.It("should read a "+size.name+" cluster within acceptable latency", func(ctx context.Context) {
+				ginkgo.By("creating a " + size.name + " cluster")
+				cluster, err := h.Client.CreateClusterFromPayload(ctx, h.TestDataPath(size.payload))


The entity-size test creates clusters and reads them immediately without waiting for Reconciled. The other lifecycle tests (create, update, delete) all wait for Reconciled first.

A freshly-created cluster has only the spec — no adapter statuses, no conditions, no data from adapter reports. A reconciled cluster has all of that, making the JSON response larger. So this measures read latency on a smaller response than a production read would return.

If the intent is "does a bigger spec affect read latency" — fine as-is. If it's "realistic read latency of a fully-provisioned cluster" — it's undercounting. A one-line comment clarifying the intent would help.

tirthct · 2026-06-16T19:03:38Z

+			_, err := h.Client.PatchClusterFromPayload(ctx, clusterID, h.TestDataPath("payloads/clusters/cluster-patch.json"))
+			Expect(err).NotTo(HaveOccurred())
+
+			Eventually(h.PollCluster(ctx, clusterID), h.Cfg.Timeouts.Cluster.Reconciled, h.Cfg.Polling.Interval).


Potential false near-zero latency from stale condition read.

After PATCH (line 44), generation increments from 1 to 2. Line 47-48 immediately polls for Reconciled=True. But Reconciled=True was already set from gen 1 reconciliation. HaveResourceCondition checks type and status but not observed_generation — so stale True at gen 1 is indistinguishable from fresh True at gen 2.

If the API doesn't synchronously flip Reconciled=False during the PATCH handler, the first poll reads the stale True and the test reports near-zero latency.

Suggested fix — wait for False first, then True:

// Confirm API detected generation mismatch Eventually(h.PollCluster(ctx, clusterID), h.Cfg.Timeouts.Cluster.Reconciled, h.Cfg.Polling.Interval). Should(helper.HaveResourceCondition(client.ConditionTypeReconciled, openapi.ResourceConditionStatusFalse)) // Then wait for re-reconciliation at new generation Eventually(h.PollCluster(ctx, clusterID), h.Cfg.Timeouts.Cluster.Reconciled, h.Cfg.Polling.Interval). Should(helper.HaveResourceCondition(client.ConditionTypeReconciled, openapi.ResourceConditionStatusTrue))

Or add a generation-aware matcher that checks observed_generation == resource.generation.

HYPERFLEET-1185 - feat: add performance tests and

1b8b8a5

tooling

openshift-ci Bot requested review from ciaranRoche and ldornele June 16, 2026 15:27

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

HYPERFLEET-1185 - chore: address review feedback

7bb0de2

tirthct reviewed Jun 16, 2026

View reviewed changes

Conversation

pnguyen44 commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

openshift-ci Bot commented Jun 16, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Walkthrough

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pnguyen44 commented Jun 16, 2026

Uh oh!

tirthct Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

tirthct Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

tirthct Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pnguyen44 commented Jun 16, 2026 •

edited

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading