Skip to content

feat: Add FDv2 synchronizer fallback and recovery conditions#297

Open
kinyoklion wants to merge 3 commits into
mainfrom
rlamb/sdk-2186/fdv2-conditions
Open

feat: Add FDv2 synchronizer fallback and recovery conditions#297
kinyoklion wants to merge 3 commits into
mainfrom
rlamb/sdk-2186/fdv2-conditions

Conversation

@kinyoklion

@kinyoklion kinyoklion commented Jun 10, 2026

Copy link
Copy Markdown
Member

What this adds

Timed conditions that the FDv2 data source orchestrator (a later PR) observes alongside the active synchronizer's results to drive tier transitions:

  • Fallback condition: starts its timer when the synchronizer reports an interrupted status and cancels it when a change set arrives. If it fires (120 seconds by default), the orchestrator moves to the next available synchronizer. Terminal statuses (shutdown, terminal error, goodbye) do not arm the timer — the orchestrator reacts to those immediately rather than waiting out a fallback period — and repeated interruptions do not extend the deadline; the period counts from the first interruption.
  • Recovery condition: starts when observed and ignores results. If it fires (300 seconds by default), the orchestrator returns to the primary synchronizer.

getConditions selects which conditions apply: none when only one synchronizer is available (nowhere to fall back to), fallback only for the primary, and both for a non-primary synchronizer. ConditionGroup merges its members and emits the first to fire.

The timeout defaults match the other client-side FDv2 implementations.

Why streams instead of futures

The Java and C++ implementations model conditions as one-shot futures, and both had the same leak (fixed in java-core by c27bf26 / launchdarkly/java-core#163): a future's listeners can never be detached, only released by completion, so an orchestration loop that races a long-lived pending future per result accumulates an irremovable listener garland per change set — unbounded on a healthy primary. Dart futures have the identical property (measured: ~559 B retained per race against a pending future), but Dart also has the primitive those platforms lack: cancellable stream subscriptions.

Each condition therefore exposes a single-subscription Stream<ConditionType> that emits at most once and then closes (closing without emitting if the condition is closed first). Lifetimes are scoped to the subscription: self-starting timers begin when the stream is listened to, and cancelling the subscription closes the condition and releases its timers. One bounded edge remains by design — informing a never-listened group can arm a fallback timer until its timeout elapses or close() is called — and is documented on the API. The orchestrator PR consumes these with one subscription per synchronizer run, closes in a finally, and includes a soak test asserting bounded memory across a sustained stream of results.

Testing

Tests run inside a package:fake_async zone, advancing time with elapse and asserting timer state through pendingTimers — no timer injection in the production code. They cover timer start/cancel behavior for both condition kinds, terminal statuses not arming the fallback timer, the fallback deadline not extending on repeated interruptions, at-most-once emission including when two member timers contend in the same instant, close semantics, subscription cancellation releasing condition and group timers, group merging and inform broadcast, and the getConditions selection rules.

SDK-2186


Note

Medium Risk
New timing logic will drive synchronizer failover when wired into the orchestrator; behavior is isolated in new modules with thorough tests and no production integration in this PR.

Overview
Adds FDv2 synchronizer tier-transition conditions for a future orchestrator: timed signals that emit fallback (switch to the next synchronizer after sustained interruption) or recovery (return to primary after running on a backup).

Fallback arms on interrupted, cancels on a change set, ignores terminal statuses, and does not reset the deadline on repeated interruptions (defaults: 120s). Recovery starts when the condition is observed and ignores results (default 300s). Conditions expose single-subscription streams (at-most-one emit) so subscriptions can be cancelled without the listener-retention issues of racing futures.

ConditionGroup merges members (first fire wins), broadcasts inform to all members, and getConditions picks an empty group, fallback-only for primary, or fallback+recovery for non-primary when multiple synchronizers exist.

Adds fake_async tests for timer behavior, close/cancel semantics, group contention, and getConditions rules.

Reviewed by Cursor Bugbot for commit 4fc80f6. Bugbot is set up for automated code reviews on this repo. Configure here.

@kinyoklion kinyoklion force-pushed the rlamb/sdk-2186/fdv2-conditions branch 2 times, most recently from b281b1a to 201d560 Compare June 10, 2026 22:40
@kinyoklion

Copy link
Copy Markdown
Member Author

bugbot review

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 201d560. Configure here.

Timed conditions the FDv2 orchestrator will observe alongside the
active synchronizer: a fallback condition that fires after a
synchronizer has been interrupted too long (120 seconds by default),
and a recovery condition that periodically returns to the primary
synchronizer (300 seconds by default).

Conditions are streams rather than futures so a consumer can detach:
cancelling a stream subscription releases the consumer, whereas a
listener on a never-completing future can never be removed and would
be retained for the condition's whole lifetime. Each condition emits
at most once and then closes, lifetimes are scoped to the
subscription (timers start on listen and cancellation closes the
condition), and nothing consumes these yet.
@kinyoklion kinyoklion force-pushed the rlamb/sdk-2186/fdv2-conditions branch from 201d560 to 7528fd5 Compare June 10, 2026 22:59
@kinyoklion kinyoklion marked this pull request as ready for review June 10, 2026 23:01
@kinyoklion kinyoklion requested a review from a team as a code owner June 10, 2026 23:01
{required void Function() start,
required void Function() cancel})? _informHandler;

late final StreamController<ConditionType> _controller =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a stream controller heavy for this? Is there another single event stream producing type that is lighter weight?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that a stream controller is going to be substantially different than a future.

void _startTimer() {
if (_timer != null || _closed) return;
_timer = Timer(_timeout, () {
_timer = null;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it needs to null the timer since it has the _closed bool providing protection. I fear nulling the timer in various spots in this code (another spot being cancelTimer) could open it up to a path in which a second timer gets created.

@kinyoklion kinyoklion Jun 11, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking I think the opposite, from a timer perspective. Always nulling and checking ensures that you check you have a timer before starting one. In the closed case I am fine with not nulling it. Though normally I would still null it.

// condition is observed (recovery behavior). With one, the handler
// decides when to start it.
if (_informHandler == null) {
_startTimer();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the timer start be independent of when someone starts listening? (Not much of a practical difference since the condition is immediately hooked up, but who knows what may happen in the future)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't be a problem in practice, but generally speaking you don't want to start before someone starts listening.

Like when it is easier to start them immediately that makes sense, but if it is easy to start on use, then I think that is better.

_finish();
}));
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if futures and a future combining operator would be simpler than streams since there is only one signal being emitted.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't approach isn't susceptible to the memory leak problem with incomplete promises retaining references to completion.

void _subscribe() {
if (_closed) return;
if (_conditions.isEmpty) {
_controller.close();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This _controller.close() feels a little off. Should this "no conditions" path just call finish() to be consistent with other cleanup?

It also seems like maybe not even having special handling for the empty conditions case is necessary, may keep things simpler to remove.

@kinyoklion kinyoklion requested a review from tanderson-ld June 11, 2026 20:02
kinyoklion added a commit that referenced this pull request Jun 11, 2026
## What this adds

The source manager tracks which FDv2 data source is active and handles
source transitions for the orchestrator (a later PR). Nothing consumes
it yet, so behavior is unchanged.

It manages two tiers with different lifecycles:

- **Initializers** run once each, in order, and exhaust: each call to
`nextInitializer` closes the previous active source and produces the
next one until the list runs out. The orchestrator uses this to bring
the SDK to a usable state from cache or a poll.
- **Synchronizer slots** cycle rather than exhaust:
`nextAvailableSynchronizer` scans forward from the current position with
wrap-around, skipping blocked slots. A slot is blocked after a terminal
error (`blockCurrentSynchronizer`), the scan position resets to the
primary for recovery (`resetSynchronizerIndex`), and
`recreateCurrentSynchronizer` produces a fresh instance of the current
slot without advancing — used when a source must drop its connection and
re-establish it (goodbye, restart).

`isPrimarySynchronizer` and `availableSynchronizerCount` feed the
fallback/recovery condition selection (#297).

**FDv1 fallback slots** start blocked and are only activated by a server
fallback directive: `engageFdv1Fallback` makes the FDv1 slots available
and blocks the FDv2 tier, disabling it rather than removing it so the
configuration survives for a later recovery to FDv2.

Every transition closes the previously active source, and `close` shuts
the manager down so no further sources can be created.

## Testing

Unit tests cover initializer ordering and exhaustion, synchronizer
cycling with wrap-around, blocked-slot skipping, the all-blocked case,
recovery reset, in-place re-creation, FDv1 fallback engagement (FDv2
tier disabled, FDv1 tier activated), and that close prevents further
source creation and closes the active source.

SDK-2186

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> New isolated module with tests only; no integration into existing data
source paths until a follow-up PR.
> 
> **Overview**
> Adds a new FDv2 **`SourceManager`** (plus **`SynchronizerSlot`** /
slot state) that will drive orchestrator source transitions. It is **not
wired in yet**, so runtime behavior is unchanged.
> 
> **Initializers** are advanced sequentially via `nextInitializer()`
(each step closes the prior active source) until the factory list is
exhausted. **Synchronizers** live in slots that **rotate** with
`nextAvailableSynchronizer()` (forward scan with wrap, skip blocked
slots). Recovery helpers include `blockCurrentSynchronizer()`,
`resetSynchronizerIndex()`, and `recreateCurrentSynchronizer()` for
reconnect without changing slot. **`engageFdv1Fallback()`** blocks FDv2
slots and unblocks FDv1 fallback slots (which start blocked). Exposes
`isPrimarySynchronizer`, `availableSynchronizerCount`, and
`hasFdv1Fallback` for later fallback logic.
> 
> Unit tests cover ordering, cycling, blocking, FDv1 engagement,
recreate, and shutdown.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
1716d97. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants