feat: Add FDv2 synchronizer fallback and recovery conditions#297
feat: Add FDv2 synchronizer fallback and recovery conditions#297kinyoklion wants to merge 3 commits into
Conversation
b281b1a to
201d560
Compare
|
bugbot review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 201d560. Configure here.
Timed conditions the FDv2 orchestrator will observe alongside the active synchronizer: a fallback condition that fires after a synchronizer has been interrupted too long (120 seconds by default), and a recovery condition that periodically returns to the primary synchronizer (300 seconds by default). Conditions are streams rather than futures so a consumer can detach: cancelling a stream subscription releases the consumer, whereas a listener on a never-completing future can never be removed and would be retained for the condition's whole lifetime. Each condition emits at most once and then closes, lifetimes are scoped to the subscription (timers start on listen and cancellation closes the condition), and nothing consumes these yet.
201d560 to
7528fd5
Compare
| {required void Function() start, | ||
| required void Function() cancel})? _informHandler; | ||
|
|
||
| late final StreamController<ConditionType> _controller = |
There was a problem hiding this comment.
Is a stream controller heavy for this? Is there another single event stream producing type that is lighter weight?
There was a problem hiding this comment.
I don't think that a stream controller is going to be substantially different than a future.
| void _startTimer() { | ||
| if (_timer != null || _closed) return; | ||
| _timer = Timer(_timeout, () { | ||
| _timer = null; |
There was a problem hiding this comment.
I don't think it needs to null the timer since it has the _closed bool providing protection. I fear nulling the timer in various spots in this code (another spot being cancelTimer) could open it up to a path in which a second timer gets created.
There was a problem hiding this comment.
Generally speaking I think the opposite, from a timer perspective. Always nulling and checking ensures that you check you have a timer before starting one. In the closed case I am fine with not nulling it. Though normally I would still null it.
| // condition is observed (recovery behavior). With one, the handler | ||
| // decides when to start it. | ||
| if (_informHandler == null) { | ||
| _startTimer(); |
There was a problem hiding this comment.
Should the timer start be independent of when someone starts listening? (Not much of a practical difference since the condition is immediately hooked up, but who knows what may happen in the future)
There was a problem hiding this comment.
It wouldn't be a problem in practice, but generally speaking you don't want to start before someone starts listening.
Like when it is easier to start them immediately that makes sense, but if it is easy to start on use, then I think that is better.
| _finish(); | ||
| })); | ||
| } | ||
| } |
There was a problem hiding this comment.
Wondering if futures and a future combining operator would be simpler than streams since there is only one signal being emitted.
There was a problem hiding this comment.
This isn't approach isn't susceptible to the memory leak problem with incomplete promises retaining references to completion.
| void _subscribe() { | ||
| if (_closed) return; | ||
| if (_conditions.isEmpty) { | ||
| _controller.close(); |
There was a problem hiding this comment.
This _controller.close() feels a little off. Should this "no conditions" path just call finish() to be consistent with other cleanup?
It also seems like maybe not even having special handling for the empty conditions case is necessary, may keep things simpler to remove.
## What this adds The source manager tracks which FDv2 data source is active and handles source transitions for the orchestrator (a later PR). Nothing consumes it yet, so behavior is unchanged. It manages two tiers with different lifecycles: - **Initializers** run once each, in order, and exhaust: each call to `nextInitializer` closes the previous active source and produces the next one until the list runs out. The orchestrator uses this to bring the SDK to a usable state from cache or a poll. - **Synchronizer slots** cycle rather than exhaust: `nextAvailableSynchronizer` scans forward from the current position with wrap-around, skipping blocked slots. A slot is blocked after a terminal error (`blockCurrentSynchronizer`), the scan position resets to the primary for recovery (`resetSynchronizerIndex`), and `recreateCurrentSynchronizer` produces a fresh instance of the current slot without advancing — used when a source must drop its connection and re-establish it (goodbye, restart). `isPrimarySynchronizer` and `availableSynchronizerCount` feed the fallback/recovery condition selection (#297). **FDv1 fallback slots** start blocked and are only activated by a server fallback directive: `engageFdv1Fallback` makes the FDv1 slots available and blocks the FDv2 tier, disabling it rather than removing it so the configuration survives for a later recovery to FDv2. Every transition closes the previously active source, and `close` shuts the manager down so no further sources can be created. ## Testing Unit tests cover initializer ordering and exhaustion, synchronizer cycling with wrap-around, blocked-slot skipping, the all-blocked case, recovery reset, in-place re-creation, FDv1 fallback engagement (FDv2 tier disabled, FDv1 tier activated), and that close prevents further source creation and closes the active source. SDK-2186 <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > New isolated module with tests only; no integration into existing data source paths until a follow-up PR. > > **Overview** > Adds a new FDv2 **`SourceManager`** (plus **`SynchronizerSlot`** / slot state) that will drive orchestrator source transitions. It is **not wired in yet**, so runtime behavior is unchanged. > > **Initializers** are advanced sequentially via `nextInitializer()` (each step closes the prior active source) until the factory list is exhausted. **Synchronizers** live in slots that **rotate** with `nextAvailableSynchronizer()` (forward scan with wrap, skip blocked slots). Recovery helpers include `blockCurrentSynchronizer()`, `resetSynchronizerIndex()`, and `recreateCurrentSynchronizer()` for reconnect without changing slot. **`engageFdv1Fallback()`** blocks FDv2 slots and unblocks FDv1 fallback slots (which start blocked). Exposes `isPrimarySynchronizer`, `availableSynchronizerCount`, and `hasFdv1Fallback` for later fallback logic. > > Unit tests cover ordering, cycling, blocking, FDv1 engagement, recreate, and shutdown. > > <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit 1716d97. Bugbot is set up for automated code reviews on this repo. Configure [here](https://www.cursor.com/dashboard/bugbot).</sup> <!-- /CURSOR_SUMMARY -->
What this adds
Timed conditions that the FDv2 data source orchestrator (a later PR) observes alongside the active synchronizer's results to drive tier transitions:
getConditionsselects which conditions apply: none when only one synchronizer is available (nowhere to fall back to), fallback only for the primary, and both for a non-primary synchronizer.ConditionGroupmerges its members and emits the first to fire.The timeout defaults match the other client-side FDv2 implementations.
Why streams instead of futures
The Java and C++ implementations model conditions as one-shot futures, and both had the same leak (fixed in java-core by
c27bf26/ launchdarkly/java-core#163): a future's listeners can never be detached, only released by completion, so an orchestration loop that races a long-lived pending future per result accumulates an irremovable listener garland per change set — unbounded on a healthy primary. Dart futures have the identical property (measured: ~559 B retained per race against a pending future), but Dart also has the primitive those platforms lack: cancellable stream subscriptions.Each condition therefore exposes a single-subscription
Stream<ConditionType>that emits at most once and then closes (closing without emitting if the condition is closed first). Lifetimes are scoped to the subscription: self-starting timers begin when the stream is listened to, and cancelling the subscription closes the condition and releases its timers. One bounded edge remains by design — informing a never-listened group can arm a fallback timer until its timeout elapses orclose()is called — and is documented on the API. The orchestrator PR consumes these with one subscription per synchronizer run, closes in afinally, and includes a soak test asserting bounded memory across a sustained stream of results.Testing
Tests run inside a
package:fake_asynczone, advancing time withelapseand asserting timer state throughpendingTimers— no timer injection in the production code. They cover timer start/cancel behavior for both condition kinds, terminal statuses not arming the fallback timer, the fallback deadline not extending on repeated interruptions, at-most-once emission including when two member timers contend in the same instant, close semantics, subscription cancellation releasing condition and group timers, group merging and inform broadcast, and thegetConditionsselection rules.SDK-2186
Note
Medium Risk
New timing logic will drive synchronizer failover when wired into the orchestrator; behavior is isolated in new modules with thorough tests and no production integration in this PR.
Overview
Adds FDv2 synchronizer tier-transition conditions for a future orchestrator: timed signals that emit fallback (switch to the next synchronizer after sustained interruption) or recovery (return to primary after running on a backup).
Fallback arms on
interrupted, cancels on a change set, ignores terminal statuses, and does not reset the deadline on repeated interruptions (defaults: 120s). Recovery starts when the condition is observed and ignores results (default 300s). Conditions expose single-subscription streams (at-most-one emit) so subscriptions can be cancelled without the listener-retention issues of racing futures.ConditionGroupmerges members (first fire wins), broadcastsinformto all members, andgetConditionspicks an empty group, fallback-only for primary, or fallback+recovery for non-primary when multiple synchronizers exist.Adds
fake_asynctests for timer behavior, close/cancel semantics, group contention, andgetConditionsrules.Reviewed by Cursor Bugbot for commit 4fc80f6. Bugbot is set up for automated code reviews on this repo. Configure here.