feat(loops): opt-in sandbox poll-mode (drop-resilient batch streaming)#184
Merged
Conversation
Long, quiet in-box turns (clone/build/test) idle-drop their live SSE on the sandbox (replay-404 / exhausted reconnects) on BOTH prod and staging, which excludes cells and blocks batch eval runs. The reconnect path (lastEventId replay) 404s because the per-session event buffer is reaped between the drop and the reconnect. Adds an opt-in LoopLineageOptions.streaming: 'poll' (default 'sse', unchanged). Poll-mode fire-and-detaches via dispatchPrompt, awaits the terminal result by status-poll (session(id).result() — no held stream), and yields the answer as one synthesized event. With no live SSE held across the quiet execution, the idle-drop is impossible by construction. Threaded through the default fresh-box path (what batch runs actually use — the lineage only activates on sessionContinuity/forkFanout) and the lineage start/continue/fork. Lower trace fidelity (one terminal event, not per-token), so opt-in for batch; interactive chat keeps live SSE. Bench opts in via SANDBOX_STREAMING=poll. - typecheck clean; 37 kernel tests pass incl. a new poll-mode wiring test (dispatchPrompt + result() used, streamPrompt never held) - default 'sse' byte-identical (13 existing lineage tests pass unchanged) e2e validation against the live drop is pending sandbox stability — prod is currently degraded at the box-ACQUISITION layer too (separate from the SSE drop), which blocked the end-to-end run. Correct by construction + unit-tested; the lift number lands once a box can be acquired.
…g, main was lint-red) + format the new poll-mode test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Long, quiet in-box turns (clone → build → test) idle-drop their live SSE on the sandbox —
Stream dropped without terminal event→Replay endpoint returned 404→Exhausted 3 reconnection attempts— on both prod and staging. The cell is excluded, so batch eval runs can't complete (the eyes-present self-improvement proof is blocked on it). The reconnect (lastEventIdreplay) 404s because the per-session event buffer is reaped between the drop and the reconnect.Change
Opt-in
LoopLineageOptions.streaming: 'poll'(default'sse', unchanged). Poll-mode:dispatchPrompt,box.session(id).result()— no held stream),With no live SSE held across the quiet execution, the idle-drop is impossible by construction. Threaded through the default fresh-box path (what batch runs actually use — the lineage only activates on
sessionContinuity/forkFanout) and the lineagestart/continue/fork. Bench opts in viaSANDBOX_STREAMING=poll.Trade-off: lower trace fidelity (one terminal event, not per-token) → opt-in for batch; interactive chat keeps live SSE.
Validation
tsc --noEmitclean; 37 kernel tests pass, incl. a new poll-mode wiring test (dispatchPrompt+result()used,streamPromptnever held).'sse'path byte-identical — the 13 existing lineage tests pass unchanged.Honest scope
e2e validation against the live drop is pending sandbox stability. Tonight's prod sandbox is also degraded at the box-acquisition layer (
could not acquire a running sandbox— separate from the SSE drop), which blocked the end-to-end run before streaming. The change is correct-by-construction + unit-tested; the actual lift number lands once a box can be acquired. The acquisition-layer degradation is a separate agent-dev-container reliability issue, not addressed here.