NO-JIRA: Fix Flaky e2e tests - Update the Cleanup order by YamunadeviShanmugam · Pull Request #31347 · openshift/origin

YamunadeviShanmugam · 2026-06-29T08:20:35Z

Fix premature OAuth token deletion in CLI

OAuthAccessToken deleted before framework DeferCleanup runs

GetClientConfigForUser created an OAuthAccessToken and registered it in resourcesToDelete. This meant it was deleted during TeardownProject, which is registered as a g.AfterEach. The Kubernetes framework's namespace cleanup, however, runs via ginkgo.DeferCleanup (registered inside f.BeforeEach), and the documented execution order is:

It block
→ all AfterEach nodes (including TeardownProject) ← token deleted here
→ all DeferCleanups (LIFO) ← framework cleanup here
→ f.AfterEach (namespace deletion)
This meant the bearer token embedded in the returned *rest.Config became invalid before any DeferCleanup callbacks had a chance to run. Any deferred cleanup that tried to make authenticated API calls as that user would receive 401 Unauthorized.

Fix:

Remove the token from resourcesToDelete and instead set ExpiresIn: 6 hours. The associated User object is still explicitly deleted in TeardownProject, which is sufficient to revoke effective access. The ExpiresIn TTL ensures the token is eventually cleaned up by the OpenShift OAuth token controller without needing explicit deletion.

Latent OIDC test bug fixed as a consequence

test/extended/authentication/oidc.go calls GetClientConfigForUser in a g.BeforeAll and stores the result in oauthUserConfig. This config is then used across two separate ordered It blocks:

"should not accept tokens provided by the OAuth server" (expects Unauthorized)
"should accept tokens provided by the OpenShift OAuth server" (expects 200 OK)
With the old code, TeardownProject deleted the OAuthAccessToken after the first It block. The first test passed for the wrong reason (deleted token → Unauthorized, which matched the expectation, but for the wrong cause). The second test silently used an invalid bearer token. As the current fix keeps the token alive across all ordered It blocks as intended, so both tests validate the correct behavior.

Summary by CodeRabbit

Bug Fixes
- Adjusted OAuth access token provisioning to use a 6-hour automatic expiration, reducing the risk of stale authentication artifacts.
- Improved test cleanup by avoiding explicit token deletion during teardown, preventing cached clients from losing credentials (which could otherwise lead to authentication fallback and follow-on failures).

openshift-merge-bot · 2026-06-29T08:20:39Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

coderabbitai · 2026-06-29T08:21:01Z

Walkthrough

GetClientConfigForUser now creates an OAuth access token with a 6-hour expiry and no longer registers that token for explicit teardown cleanup.

Changes

OAuth token provisioning

Layer / File(s)	Summary
Token TTL and cleanup `test/extended/util/client.go`	`GetClientConfigForUser` sets `ExpiresIn` on the created `OAuthAccessToken`, removes the explicit `resourcesToDelete` registration for that token, and documents the cleanup failure mode.

Estimated code review effort: 2 (Simple) | ~10 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Structure And Quality	⚠️ Warning	FAIL: TeardownProject still never resets resourcesToDelete, so the shared CLI accumulates cleanup targets across tests; some assertions in OIDC also lack failure messages.	Reset c.resourcesToDelete to nil at the end of TeardownProject, and ensure remaining Ginkgo assertions carry helpful failure messages.

✅ Passed checks (14 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	PR only changes cleanup/token TTL in util/client.go; no Ginkgo titles were added or modified, and related OIDC titles remain static literals.
Microshift Test Compatibility	✅ Passed	No new or edited Ginkgo e2e tests were added; the PR only changes a test helper and an extension-binary registry.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	Only shared cleanup/token logic changed in util/client.go; no new Ginkgo e2e tests or multi-node/SNO assumptions were added.
Topology-Aware Scheduling Compatibility	✅ Passed	Only test/extended/util/client.go changed; it adds OAuth token cleanup tweaks and no scheduling, manifest, or controller changes.
Ote Binary Stdout Contract	✅ Passed	Changes only adjust OAuth token cleanup/TTL in GetClientConfigForUser; no process-level stdout writes were added.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	Only test helper cleanup/token TTL logic changed; no new Ginkgo tests, IPv4-only assumptions, or external connectivity were introduced.
No-Weak-Crypto	✅ Passed	Touched code only uses crypto/sha256 for token hashing; no MD5/SHA1/DES/RC4/ECB or insecure token comparisons found.
Container-Privileges	✅ Passed	PR only changes test/extended/util/client.go; no manifests or privilege fields (privileged, hostPID, hostNetwork, hostIPC, SYS_ADMIN, allowPrivilegeEscalation) were added.
No-Sensitive-Data-In-Logs	✅ Passed	No new sensitive-data logging was introduced; existing client command logs redact bearer tokens, and the PR only changes token cleanup/comments.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main change: adjusting cleanup behavior to fix flaky e2e tests.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

gangwgr · 2026-06-29T08:24:47Z

/lgtm

openshift-merge-bot · 2026-06-29T09:30:11Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

YamunadeviShanmugam · 2026-06-29T13:41:22Z

/retest

YamunadeviShanmugam · 2026-06-30T08:04:23Z

/retest

YamunadeviShanmugam · 2026-06-30T11:25:31Z

The failure is unrelated to the PR - regression caused by openshift/cluster-version-operator#1395

YamunadeviShanmugam · 2026-07-01T08:13:00Z

/retest

ardaguclu · 2026-07-01T10:18:44Z

 	}
-	c.AddResourceToDelete(oauthv1.GroupVersion.WithResource("oauthaccesstokens"), token)
+	// The token is intentionally not added to resourcesToDelete.
+	// TeardownProject runs as an AfterEach, which fires before the framework's


By emptying the slice in c.resourcesToDelete = nil, we fix the multiple subsequent deletions. Why do we need this?. If we need this, why do we emptying the slices in c.resourcesToDelete = nil. I think this PR should focus on fixing the one issue at a time.

In my opinion, we should keep c.AddResourceToDelete(oauthv1.GroupVersion.WithResource("oauthaccesstokens"), token) as is.

@ardaguclu : Yes you are correct, emptying the slice is not going to fix the actual issue, it is a performance improvement of not repeating the deletion in subsequent tests(no failures as it will return 404 and skips), adding this in the PR is deviating the context, will remove it

Regarding the tokens, as we are adding it to the deletion - it is getting deleted in TearDownProject which causes the failure in defercleanup where the token is required. This order is wrong.

Another approach is to delete the token in beforeeach, but the current design is simpler.

According to the documented justification, tests should always fail. Because when DeferCleanup is invoked, token should have already been deleted, no?

Yes Arda, it is completely misleading - fixed it with actual issue

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/extended/util/client.go`:
- Around line 1210-1216: The token cleanup currently skips deleting bearer
tokens for users that already existed, leaving valid tokens behind until TTL
expiry. Update GetClientConfigForUser in client.go so there is an additional
cleanup path for the token itself, using a detached timeout context built with
context.WithoutCancel and context.WithTimeout, while keeping the existing
user-deletion behavior intact. Make sure the cleanup runs late enough for
DeferCleanup to still authenticate, and reference the existing token/user
teardown logic around GetClientConfigForUser and resourcesToDelete when wiring
it in.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 3361ae0e-fb69-454a-bd88-5263770edb99

📥 Commits

Reviewing files that changed from the base of the PR and between 74d2be9 and d06bacf.

📒 Files selected for processing (1)

test/extended/util/client.go

ardaguclu · 2026-07-01T12:17:16Z

 		UserUID:     string(user.UID),
 		Scopes:      []string{"user:full"},
 		RedirectURI: "https://localhost:8443/oauth/token/implicit",
+		ExpiresIn:   86400, // 24 h TTL; auto-expires without explicit deletion


A job takes max 4 hours in average. Would it make sense setting this to 6h?

Improve the cleanup order for fixing flakiness

ardaguclu · 2026-07-01T12:21:09Z

/lgtm

openshift-merge-bot · 2026-07-01T13:15:03Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

sosiouxme · 2026-07-01T13:25:30Z

/approve

openshift-ci · 2026-07-01T13:26:09Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ardaguclu, gangwgr, sosiouxme, YamunadeviShanmugam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [sosiouxme]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sosiouxme · 2026-07-01T21:23:37Z

/retest-required

YamunadeviShanmugam · 2026-07-02T07:56:13Z

Waiting on csi pipeline - as it is consistently failing for consecutive 5th time across multiple PRs

YamunadeviShanmugam · 2026-07-02T17:08:29Z

/test e2e-gcp-ovn e2e-aws-csi

YamunadeviShanmugam · 2026-07-03T05:35:26Z

/retitle NO-JIRA: Fix Flaky e2e tests - Update the Cleanup order

openshift-ci-robot · 2026-07-03T05:35:31Z

@YamunadeviShanmugam: This pull request explicitly references no jira issue.

Details

In response to this:

Fix premature OAuth token deletion in CLI

OAuthAccessToken deleted before framework DeferCleanup runs

GetClientConfigForUser created an OAuthAccessToken and registered it in resourcesToDelete. This meant it was deleted during TeardownProject, which is registered as a g.AfterEach. The Kubernetes framework's namespace cleanup, however, runs via ginkgo.DeferCleanup (registered inside f.BeforeEach), and the documented execution order is:

It block
→ all AfterEach nodes (including TeardownProject) ← token deleted here
→ all DeferCleanups (LIFO) ← framework cleanup here
→ f.AfterEach (namespace deletion)
This meant the bearer token embedded in the returned *rest.Config became invalid before any DeferCleanup callbacks had a chance to run. Any deferred cleanup that tried to make authenticated API calls as that user would receive 401 Unauthorized.

Fix:

Remove the token from resourcesToDelete and instead set ExpiresIn: 6 hours. The associated User object is still explicitly deleted in TeardownProject, which is sufficient to revoke effective access. The ExpiresIn TTL ensures the token is eventually cleaned up by the OpenShift OAuth token controller without needing explicit deletion.

Latent OIDC test bug fixed as a consequence

test/extended/authentication/oidc.go calls GetClientConfigForUser in a g.BeforeAll and stores the result in oauthUserConfig. This config is then used across two separate ordered It blocks:

"should not accept tokens provided by the OAuth server" (expects Unauthorized)
"should accept tokens provided by the OpenShift OAuth server" (expects 200 OK)
With the old code, TeardownProject deleted the OAuthAccessToken after the first It block. The first test passed for the wrong reason (deleted token → Unauthorized, which matched the expectation, but for the wrong cause). The second test silently used an invalid bearer token. As the current fix keeps the token alive across all ordered It blocks as intended, so both tests validate the correct behavior.

Summary by CodeRabbit

Bug Fixes

Adjusted OAuth access token provisioning to use a 6-hour automatic expiration, reducing the risk of stale authentication artifacts.

Improved test cleanup by avoiding explicit token deletion during teardown, preventing cached clients from losing credentials (which could otherwise lead to authentication fallback and follow-on failures).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-07-03T05:37:13Z

@YamunadeviShanmugam: YamunadeviShanmugam unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers openshift-sustaining-engineers.

Details

In response to this:

/override ci/prow/e2e-aws-csi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

YamunadeviShanmugam · 2026-07-03T05:37:40Z

/verified by CI runs for required tests

openshift-ci-robot · 2026-07-03T05:37:51Z

@YamunadeviShanmugam: This PR has been marked as verified by CI runs for required tests.

Details

In response to this:

/verified by CI runs for required tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-merge-bot · 2026-07-03T07:23:14Z

/retest-required

Remaining retests: 0 against base HEAD f2c72ee and 2 for PR HEAD 95670ed in total

openshift-ci · 2026-07-03T09:41:05Z

@YamunadeviShanmugam: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-csi	`95670ed`	link	true	`/test e2e-aws-csi`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci Bot requested review from deads2k and sjenning June 29, 2026 08:24

openshift-ci Bot assigned gangwgr Jun 29, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 29, 2026

coderabbitai Bot approved these changes Jun 29, 2026

View reviewed changes

openshift-ci Bot added the ready-for-human-review Indicates a PR has been reviewed by automated tools and is ready for human review label Jun 29, 2026

ardaguclu reviewed Jul 1, 2026

View reviewed changes

YamunadeviShanmugam force-pushed the improve_test_steps_client_go branch from 74d2be9 to d06bacf Compare July 1, 2026 11:49

openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 1, 2026

coderabbitai Bot requested changes Jul 1, 2026

View reviewed changes

Comment thread test/extended/util/client.go Outdated

YamunadeviShanmugam force-pushed the improve_test_steps_client_go branch from d06bacf to 14985db Compare July 1, 2026 12:12

ardaguclu reviewed Jul 1, 2026

View reviewed changes

Comment thread test/extended/util/client.go

fix(flakiness): Improve the cleanup order

95670ed

Improve the cleanup order for fixing flakiness

YamunadeviShanmugam force-pushed the improve_test_steps_client_go branch from 14985db to 95670ed Compare July 1, 2026 12:20

coderabbitai Bot approved these changes Jul 1, 2026

View reviewed changes

openshift-ci Bot assigned ardaguclu Jul 1, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jul 1, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 1, 2026

openshift-ci Bot changed the title ~~Fix Flaky e2e tests - Update the Cleanup order~~ NO-JIRA: Fix Flaky e2e tests - Update the Cleanup order Jul 3, 2026

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 3, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Jul 3, 2026

Uh oh!

Conversation

YamunadeviShanmugam commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OAuthAccessToken deleted before framework DeferCleanup runs

Fix:

Latent OIDC test bug fixed as a consequence

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented Jun 29, 2026

Uh oh!

coderabbitai Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

gangwgr commented Jun 29, 2026

Uh oh!

openshift-merge-bot Bot commented Jun 29, 2026

Uh oh!

YamunadeviShanmugam commented Jun 29, 2026

Uh oh!

YamunadeviShanmugam commented Jun 30, 2026

Uh oh!

YamunadeviShanmugam commented Jun 30, 2026

Uh oh!

YamunadeviShanmugam commented Jul 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ardaguclu commented Jul 1, 2026

Uh oh!

openshift-merge-bot Bot commented Jul 1, 2026

Uh oh!

sosiouxme commented Jul 1, 2026

Uh oh!

openshift-ci Bot commented Jul 1, 2026

Uh oh!

sosiouxme commented Jul 1, 2026

Uh oh!

YamunadeviShanmugam commented Jul 2, 2026

Uh oh!

YamunadeviShanmugam commented Jul 2, 2026

Uh oh!

YamunadeviShanmugam commented Jul 3, 2026

Uh oh!

openshift-ci-robot commented Jul 3, 2026

OAuthAccessToken deleted before framework DeferCleanup runs

Fix:

Latent OIDC test bug fixed as a consequence

Summary by CodeRabbit

Uh oh!

openshift-ci Bot commented Jul 3, 2026

Uh oh!

YamunadeviShanmugam commented Jul 3, 2026

Uh oh!

openshift-ci-robot commented Jul 3, 2026

Uh oh!

openshift-merge-bot Bot commented Jul 3, 2026

Uh oh!

YamunadeviShanmugam commented Jun 29, 2026 •

edited

Loading

coderabbitai Bot commented Jun 29, 2026 •

edited

Loading