fix(operator): guard watch error before wrapping with watch.Filter by Arsolitt · Pull Request #3 · cozystack/redis-operator

Arsolitt (Arsolitt) · 2026-05-30T07:04:19Z

Changes proposed on the PR:

Return early on a watch error in NewRedisFailoverRetriever's WatchFunc, before wrapping the watcher with watch.Filter, so the reflector retries the watch instead of crashing the operator.
Add a regression test covering the error path (error propagated, no panic, no wrapped watcher) and the happy path (watcher is wrapped).

Why

When the RedisFailover watch cannot be established, the typed client returns a nil watch.Interface together with the error. WatchFunc wrapped that result with watch.Filter unconditionally. watch.Filter immediately starts a goroutine whose loop dereferences the source watcher's ResultChan(), so a nil watcher panics the entire operator process:

panic: runtime error: invalid memory address or nil pointer dereference
k8s.io/apimachinery/pkg/watch.(*filteredWatch).loop()  filter.go:64
created by k8s.io/apimachinery/pkg/watch.Filter         filter.go:41

On clusters where the initial watch is flaky — for example a freshly bootstrapped control plane under load — this drives the operator into CrashLoopBackOff and no RedisFailover is ever reconciled.

The fix mirrors the existing ListFunc, which already returns early when the list call errors. The unguarded wrapper was introduced when namespace filtering added watch.Filter; earlier releases returned the watcher directly from WatchFunc.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved error handling in Redis Failover watch operations to prevent crashes and ensure errors are properly reported.

NewRedisFailoverRetriever's WatchFunc wrapped the result of WatchRedisFailovers with watch.Filter unconditionally. On a watch error the typed client returns a nil watch.Interface, and watch.Filter spawns a goroutine whose loop dereferences the source watcher's ResultChan, crashing the operator with SIGSEGV instead of letting the reflector retry the watch. This reliably takes the operator into CrashLoopBackOff on clusters where the initial watch is flaky (e.g. a freshly bootstrapped control plane), so no RedisFailover is ever reconciled. Return (nil, err) before wrapping, mirroring the existing ListFunc guard, and add a regression test that reproduces the panic and covers the happy path. Signed-off-by: Arsolitt <arsolitt@gmail.com>

coderabbitai · 2026-05-30T07:04:26Z

Warning

Review limit reached

@Arsolitt, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 41 minutes and 57 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0ca4d00d-bbba-4768-8c51-1276a7316adb

📥 Commits

Reviewing files that changed from the base of the PR and between c4f64aa and 93fdbf3.

📒 Files selected for processing (2)

operator/redisfailover/factory.go
operator/redisfailover/factory_test.go

📝 Walkthrough

Walkthrough

The PR fixes a potential nil-pointer panic in the watch setup logic by explicitly checking for errors from cli.WatchRedisFailovers before passing the watcher to the filtering wrapper. The change adds two unit tests that verify both the error propagation and successful filtering paths.

Changes

Watch error handling in RedisFailoverRetriever

Layer / File(s)	Summary
Error handling in WatchFunc `operator/redisfailover/factory.go`	`WatchFunc` now explicitly checks for errors from `cli.WatchRedisFailovers` and returns early without attempting to wrap a nil watcher, preventing panics. The filter is applied only when a valid watcher is returned.
Unit tests for Watch error paths `operator/redisfailover/factory_test.go`	Two tests verify the error propagation path (error returned without wrapping, no panic, nil watcher) and the success path (watcher wrapped and non-nil returned) with mocked Kubernetes service calls.

🎯 2 (Simple) | ⏱️ ~10 minutes

🐰 A watcher that watched for nil with care,
Now checks for errors ere wrapping with flair,
Two paths of test ensure no panic calls,
This fix hops through safety's sturdy halls! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding a guard for watch errors before applying watch.Filter wrapping, which directly addresses the panic issue detailed in the PR objectives.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/watchfunc-nil-watcher-panic

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request fixes a potential panic in the NewRedisFailoverRetriever's WatchFunc by checking for errors before wrapping the watcher with watch.Filter, and adds unit tests to verify this behavior. The review feedback identifies a critical issue where watch.Error events are silently filtered out because they do not match the expected RedisFailover type, which could prevent the reflector from restarting failed watches. A code suggestion is provided to explicitly allow watch.Error events to pass through the filter.

gemini-code-assist · 2026-05-30T07:05:17Z

+			return watch.Filter(watcher, func(event watch.Event) (watch.Event, bool) {
 				rf, ok := event.Object.(*redisfailoverv1.RedisFailover)
 				if !ok {
 					return event, false
 				}
 				return event, isNamespaceSupported(*rf)
-			})
-			return watcher, err
+			}), nil


When filtering watch events, it is crucial to propagate watch.Error events. If a watch.Error event occurs (for example, when a watch resource version is too old and the watch is closed by the API server), the event object will not be of type *redisfailoverv1.RedisFailover. Under the current implementation, this event will be silently filtered out because ok is false, returning event, false.

If the reflector/informer does not receive the watch.Error event, it won't know that the watch has failed and needs to be restarted immediately, potentially causing the operator to hang or delay recovery until a connection timeout occurs.

We should explicitly allow watch.Error events to pass through the filter.

return watch.Filter(watcher, func(event watch.Event) (watch.Event, bool) { if event.Type == watch.Error { return event, true } rf, ok := event.Object.(*redisfailoverv1.RedisFailover) if !ok { return event, false } return event, isNamespaceSupported(*rf) }), nil

Good catch — fixed in 93fdbf30. The filter now passes watch.Error events through (returning event, true) before the *RedisFailover type assertion, so they're no longer dropped. This lets the reflector observe the error via apierrors.FromObject and restart the watch promptly (e.g. on an expired resource version) instead of waiting for the connection to close. I also added a regression test that pushes a watch.Error through a fake watcher and asserts it reaches the filtered channel.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@operator/redisfailover/factory_test.go`:
- Around line 71-74: The test calling retriever.Watch(context.Background(),
metav1.ListOptions{}) stores the returned watcher in w but never stops it,
causing goroutine leakage; update the test to stop the watcher in the success
path by calling w.Stop() (or defer w.Stop() right after ensuring err==nil and
w!=nil) so the watcher is closed before asserting expectations (references:
retriever.Watch and variable w).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9f9f2faa-976f-4a35-be40-416dda0c9f6c

📥 Commits

Reviewing files that changed from the base of the PR and between 162776b and c4f64aa.

📒 Files selected for processing (2)

operator/redisfailover/factory.go
operator/redisfailover/factory_test.go

coderabbitai · 2026-05-30T07:20:12Z

+	w, err := retriever.Watch(context.Background(), metav1.ListOptions{})
+	assert.NoError(err)
+	assert.NotNil(w)
+	ms.AssertExpectations(t)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Stop the returned watcher in the success-path test to avoid goroutine leakage.

retriever.Watch(...) returns the filtered watcher; stopping it explicitly makes test lifecycle deterministic and avoids lingering goroutines in larger suites.

Suggested patch

w, err := retriever.Watch(context.Background(), metav1.ListOptions{}) assert.NoError(err) assert.NotNil(w) +if w != nil { + defer w.Stop() +} ms.AssertExpectations(t)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

w, err := retriever.Watch(context.Background(), metav1.ListOptions{})

assert.NoError(err)

assert.NotNil(w)

ms.AssertExpectations(t)

w, err := retriever.Watch(context.Background(), metav1.ListOptions{})

assert.NoError(err)

assert.NotNil(w)

if w != nil {

defer w.Stop()

}

ms.AssertExpectations(t)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@operator/redisfailover/factory_test.go` around lines 71 - 74, The test calling retriever.Watch(context.Background(), metav1.ListOptions{}) stores the returned watcher in w but never stops it, causing goroutine leakage; update the test to stop the watcher in the success path by calling w.Stop() (or defer w.Stop() right after ensuring err==nil and w!=nil) so the watcher is closed before asserting expectations (references: retriever.Watch and variable w).

Thanks, but I think this is redundant here. filteredWatch.Stop() just delegates to the source watcher's Stop(), and the filter goroutine ranges over the source's ResultChan(), so it exits as soon as the source closes. The test already has defer fake.Stop() (line 61), which closes that channel and terminates the goroutine — adding defer w.Stop() would only call the same fake.Stop() again. There's no goroutine left to leak, so I'll keep the test as is.

@Arsolitt, that's a fair and accurate analysis — watch.Filter starts a goroutine that ranges over w.ResultChan() and exits naturally once the source channel is closed, so defer fake.Stop() on line 61 is already the correct cleanup point. Calling defer w.Stop() would just re-invoke the same underlying fake.Stop(). Acknowledged, happy to keep the test as-is.

watch.Error events carry a *metav1.Status object rather than a *RedisFailover, so the namespace filter's type assertion silently dropped them. The reflector depends on these events to detect a failed watch (such as an expired resource version) and restart it promptly instead of waiting for a connection timeout. Pass watch.Error events through unchanged and add a regression test covering the propagation. Signed-off-by: Arsolitt <arsolitt@gmail.com>

gemini-code-assist Bot reviewed May 30, 2026

View reviewed changes

Arsolitt (Arsolitt) marked this pull request as ready for review May 30, 2026 07:18

coderabbitai Bot reviewed May 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(operator): guard watch error before wrapping with watch.Filter#3

fix(operator): guard watch error before wrapping with watch.Filter#3
Arsolitt (Arsolitt) wants to merge 2 commits into
masterfrom
fix/watchfunc-nil-watcher-panic

Arsolitt (Arsolitt) commented May 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 30, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 30, 2026

Uh oh!

Arsolitt (Arsolitt) May 30, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 30, 2026 •

edited

Loading

Uh oh!

Arsolitt (Arsolitt) May 30, 2026

Uh oh!

coderabbitai Bot May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Arsolitt (Arsolitt) commented May 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Arsolitt (Arsolitt) May 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Arsolitt (Arsolitt) May 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Arsolitt (Arsolitt) commented May 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 30, 2026 •

edited

Loading

coderabbitai Bot May 30, 2026 •

edited

Loading