Skip to content

feat(loops): opt-in sandbox poll-mode (drop-resilient batch streaming)#184

Merged
drewstone merged 2 commits into
mainfrom
feat/sandbox-poll-mode
Jun 6, 2026
Merged

feat(loops): opt-in sandbox poll-mode (drop-resilient batch streaming)#184
drewstone merged 2 commits into
mainfrom
feat/sandbox-poll-mode

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

Problem

Long, quiet in-box turns (clone → build → test) idle-drop their live SSE on the sandbox — Stream dropped without terminal eventReplay endpoint returned 404Exhausted 3 reconnection attempts — on both prod and staging. The cell is excluded, so batch eval runs can't complete (the eyes-present self-improvement proof is blocked on it). The reconnect (lastEventId replay) 404s because the per-session event buffer is reaped between the drop and the reconnect.

Change

Opt-in LoopLineageOptions.streaming: 'poll' (default 'sse', unchanged). Poll-mode:

  1. fire-and-detaches via dispatchPrompt,
  2. awaits the terminal result by status-poll (box.session(id).result() — no held stream),
  3. yields the answer as one synthesized terminal event.

With no live SSE held across the quiet execution, the idle-drop is impossible by construction. Threaded through the default fresh-box path (what batch runs actually use — the lineage only activates on sessionContinuity/forkFanout) and the lineage start/continue/fork. Bench opts in via SANDBOX_STREAMING=poll.

Trade-off: lower trace fidelity (one terminal event, not per-token) → opt-in for batch; interactive chat keeps live SSE.

Validation

  • tsc --noEmit clean; 37 kernel tests pass, incl. a new poll-mode wiring test (dispatchPrompt + result() used, streamPrompt never held).
  • Default 'sse' path byte-identical — the 13 existing lineage tests pass unchanged.

Honest scope

e2e validation against the live drop is pending sandbox stability. Tonight's prod sandbox is also degraded at the box-acquisition layer (could not acquire a running sandbox — separate from the SSE drop), which blocked the end-to-end run before streaming. The change is correct-by-construction + unit-tested; the actual lift number lands once a box can be acquired. The acquisition-layer degradation is a separate agent-dev-container reliability issue, not addressed here.

drewstone added 2 commits June 6, 2026 13:52
Long, quiet in-box turns (clone/build/test) idle-drop their live SSE on the
sandbox (replay-404 / exhausted reconnects) on BOTH prod and staging, which
excludes cells and blocks batch eval runs. The reconnect path (lastEventId
replay) 404s because the per-session event buffer is reaped between the drop
and the reconnect.

Adds an opt-in LoopLineageOptions.streaming: 'poll' (default 'sse', unchanged).
Poll-mode fire-and-detaches via dispatchPrompt, awaits the terminal result by
status-poll (session(id).result() — no held stream), and yields the answer as
one synthesized event. With no live SSE held across the quiet execution, the
idle-drop is impossible by construction. Threaded through the default fresh-box
path (what batch runs actually use — the lineage only activates on
sessionContinuity/forkFanout) and the lineage start/continue/fork.

Lower trace fidelity (one terminal event, not per-token), so opt-in for batch;
interactive chat keeps live SSE. Bench opts in via SANDBOX_STREAMING=poll.

- typecheck clean; 37 kernel tests pass incl. a new poll-mode wiring test
  (dispatchPrompt + result() used, streamPrompt never held)
- default 'sse' byte-identical (13 existing lineage tests pass unchanged)

e2e validation against the live drop is pending sandbox stability — prod is
currently degraded at the box-ACQUISITION layer too (separate from the SSE
drop), which blocked the end-to-end run. Correct by construction + unit-tested;
the lift number lands once a box can be acquired.
…g, main was lint-red) + format the new poll-mode test
@drewstone drewstone merged commit 20a7bfe into main Jun 6, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant