Skip to content

Windows/ConPTY parity: rendering, signals, tree-kill, and the run|consumer hang#124

Open
bilersan wants to merge 8 commits into
ConductorOne:mainfrom
bilersan:fix/windows-conpty-dsr-and-kill
Open

Windows/ConPTY parity: rendering, signals, tree-kill, and the run|consumer hang#124
bilersan wants to merge 8 commits into
ConductorOne:mainfrom
bilersan:fix/windows-conpty-dsr-and-kill

Conversation

@bilersan

@bilersan bilersan commented Jul 2, 2026

Copy link
Copy Markdown

Summary

Brings agent-tui to functional parity on Windows: the per-session daemon now
drives programs over a ConPTY the way it drives a PTY on Linux/macOS. The IPC
layer (interprocess local sockets → named pipes) and the cfg-gated
signal/kill paths were already in place; this branch closes the remaining
runtime gaps and hardens the Windows spawn/teardown path.

What this changes

Rendering & lifecycle

  • Answer the ConPTY startup DSR (ESC[6n) so interactive programs render
    instead of showing a blank screen.
  • Terminate events / tail --follow / attach / watch on a ConPTY pane.
    conhost keeps the pseudoconsole's output pipe open while the daemon holds the
    master, so the reader never sees EOF; a drain-grace predicate plus an explicit
    master close fix this without truncating the final frame.
  • --monitor-parent and graceful shutdown reap the pane's descendant tree and
    release the parked reader thread, so the daemon exits cleanly rather than
    hanging on the runtime's blocking-pool join.

Signals & processes

  • signal SIGINT delivers ETX (0x03) to the ConPTY input (conhost raises a
    real CTRL_C_EVENT); SIGTERM / SIGKILL perform a taskkill /F /T
    descendant-tree kill.
  • signal SIGBREAK / SIGQUIT is now rejected with an actionable error
    instead of writing a byte conhost never translates into a CTRL_BREAK_EVENT
    and reporting a false success.
  • Headless run / ask / edit spawn over plain pipes (isatty(0) == false,
    no echo, faithful exit codes).

run | consumer no longer hangs

  • A lazily-spawned daemon inside a cmd.exe pipeline inherited an extra
    inheritable pipe handle that cmd leaves in the client process, so
    agent-tui run ... | consumer could hang until the daemon's idle timeout.
    The daemon is now spawned via CreateProcessW + STARTUPINFOEX +
    PROC_THREAD_ATTRIBUTE_HANDLE_LIST with an explicit inherit allow-list (three
    fresh NUL std handles only) — the documented mechanism for precise handle
    inheritance. No new dependencies; the Unix spawn path is unchanged.

daemon upgrade returns an honest "unsupported on Windows" error (in-place
re-exec is Unix-only) rather than pretending to succeed.

Not addressed (documented, out of scope)

  • Owner-only DACL on the named pipe. The default named-pipe DACL already
    blocks cross-user command injection (other non-admin users get READ only, and
    every dangerous verb requires a WRITE) and off-box clients
    (PIPE_REJECT_REMOTE_CLIENTS). A tight DACL is still worthwhile for
    multi-user / RDP / CI hosts and is tracked separately.
  • Real Ctrl-Break delivery. Deferred — every viable route requires sharing
    the child's pseudoconsole, which the daemon does not.

Testing

  • End-to-end suite on a Windows 11 msvc host: 30/30 — DSR render, tree-kill
    (verified via Win32_Process), streaming-verb termination, signal routing,
    die / --grace, run exit-code + stdin fidelity, --monitor-parent
    (daemon exits ~330 ms after parent death), shutdown reap, and the
    run | consumer no-hang repro.
  • cargo test --workspace, cargo clippy --workspace --all-targets -- -D warnings,
    and cargo fmt --all -- --check all pass locally.
  • Windows-only host: the #[cfg(unix)] arms were reviewed but not compiled
    here. The CI matrix (ubuntu / macos / windows) covers the Linux/macOS
    build + test.

🤖 Generated with Claude Code

bilersan and others added 8 commits July 1, 2026 21:33
Two Windows-specific fixes to the daemon PTY layer.

FIX 1 — Route engine PTY write-back (DSR/DA replies) to ConPTY.
alacritty_terminal 0.26 emits terminal->host replies (DSR cursor
reports, DA1/DA2, kitty-keyboard) via Event::PtyWrite during
parser.advance. The engine dropped them, so ConPTY's startup ESC[6n
cursor-position query was never answered and the child stalled with a
blank screen. Add a default `Engine::take_pty_writes` trait method;
the alacritty engine accumulates PtyWrite bytes in TermEvents and
drains them after each feed. The daemon reader loop drains these
under capture_lock and writes them back to the PTY master after
releasing the lock (locking order: capture_lock -> feed ->
take_pty_writes -> release -> writer). The writer is now shared into
the reader loop via Arc<Mutex<..>>. The handshake bytes are NOT teed
to the recorder (protocol, not agent input; would desync .cast replay).

FIX 2 — Windows child-kill (die returned killed:false, orphaned tree).
portable-pty 0.9's WinChildKiller::kill() inverts the TerminateProcess
success BOOL (returns Err on success) and only kills the direct child,
orphaning the ConPTY host and grandchildren. On Windows, bypass the
broken killer and force-kill the whole descendant tree by PID with
`taskkill /F /T` (the Windows analog of the Unix killpg teardown).
Unix paths are untouched. No new dependencies. This also fixes
signal.rs deliver_terminate via the single kill() change.

Verified on Windows 11: snapshot text now renders the cmd prompt and
echoed input; `die` returns killed:true / kill_error:null and reaps
the child cmd.exe; the headless conhost is reaped on daemon teardown.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Reader loop: drain engine.take_pty_writes() unconditionally after a
  successful feed, decoupled from the output_buf lock, so a poisoned ring
  lock can't strand the DSR reply and wedge the child on its startup ESC[6n.
- kill(): return Err (not Ok) when child_pid() is None, so die/signal don't
  report a false killed:true when the child-handle mutex is poisoned.
- kill_tree_windows: resolve taskkill via %SystemRoot%\System32 absolute path
  (fallback to bare name) so a sanitized/relocated PATH can't break teardown.
- Fix stale EventProxy doc comment that still listed PtyWrite as dropped.

Verified: clean build, snapshot renders cmd + powershell prompts, die returns
killed:true, no orphaned conhost/cmd.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
G4 signal: deliver SIGINT/SIGQUIT/SIGBREAK to the ConPTY child by writing
the control byte (ETX 0x03 / FS 0x1c) to the pseudoconsole input via the
master writer, the idiomatic ConPTY approach. GenerateConsoleCtrlEvent
could never reach the child (separate pseudoconsole, not a process-group
root; portable-pty does not set CREATE_NEW_PROCESS_GROUP), so it always
failed and delivered nothing. Fix the false doc comments in signal.rs and
pty.rs child_pid().

G5 --monitor-parent: implement the Windows owner-death watch with
windows-sys (OpenProcess(SYNCHRONIZE|PROCESS_QUERY_LIMITED_INFORMATION) +
WaitForSingleObject(500ms) -> shutdown.notify_waiters()), mirroring the
Unix kill(pid,0) poll. Previously a silent no-op.

G6 reap_all_panes: implement the Windows arm; for each still-live pane
call kill_tree_windows(child_pid) (taskkill /F /T), the direct analog of
the Unix killpg loop. Previously a no-op -> orphaned ConPTY children and a
daemon that hung on exit.

G7 daemon upgrade: return an honest success:false error on Windows instead
of acking {status:upgrading} then silently bailing. In-place re-exec is
Unix-only.

G2-console: isolate the lazily-spawned daemon from the client console on
Windows via CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS so a client Ctrl-C
(console-wide CTRL_C_EVENT) no longer kills the daemon.

All Unix paths untouched (Windows arms gated with cfg(windows)/cfg(not(unix))).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rdown

Close the Windows streaming-termination gap: on ConPTY the daemon holds the
master for the pane's life, so conhost keeps the output pipe's write end open
and the blocking reader loop never sees EOF. As a result reader_finished()
never flips, the streaming verbs' terminal gate
(poll_exit().is_some() && reader_finished()) never fires, and the per-pane
spawn_blocking reader thread stays parked forever. Symptoms: `events`
(child_exited), `tail --follow`/`attach`/`watch` (eof) hang after the child
exits; the leaked reader thread hangs the daemon on shutdown (the runtime's
blocking-pool join never completes) and wedges the daemon after a killed
streaming client.

Part A — terminal predicate so streams END on all platforms
Add Pane::exit_drained(), a single shared helper used at BOTH streaming gates
(events + stream_follow, which backs tail --follow and attach) so they stay in
lockstep:
  poll_exit().is_some()
    && (reader_finished() || last_output_ms_ago().map_or(true, |ms| ms > 150))
ConPTY emits nothing after the child exits, so once the ring goes quiet past
the ~150ms drain grace (a couple of STREAM_IDLE_TICKs) the terminal event
fires with all trailing bytes already flushed. On Unix reader_finished() flips
the instant the child exits, so the grace branch is never reached and behavior
is byte-for-byte unchanged.

Part B — release the parked reader thread on Windows
Ownership (portable-pty 0.9 win/conpty.rs + win/psuedocon.rs): try_clone_reader
DuplicateHandle-clones its own pipe read handle rather than holding the
pseudoconsole's Arc<Mutex<Inner>>; after spawn the master holds the sole Arc,
so dropping it runs PsuedoCon::drop -> ClosePseudoConsole, conhost closes the
write end, and the reader EOFs. Add a #[cfg(windows)] PtyChild::close_master()
that swaps an inert ClosedMaster into MasterEnd::Portable (keeping the enum type
unchanged so Unix paths are untouched) to drop the real master, and call it
during pane teardown: die's Windows teardown and reap_all_panes (for every
pane, live or terminal-retained). This unblocks the reader's spawn_blocking
thread (JoinHandle::abort cannot cancel it), so the daemon no longer hangs on
shutdown and the killed-client wedge is gone. Terminal-retained snapshots are
unaffected — the engine grid and output ring are separate Arcs and stay
readable after the master is closed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`run`/`ask`/`edit` decompose into a spawn with `StdinMode::Pipe` (input
present) or `StdinMode::Closed` (no input). On Unix these route through
`spawn_with_custom_stdin` (pipe/`/dev/null` on stdin, stdout/stderr on the
PTY slave). The `#[cfg(not(unix))]` arm just returned "requires Unix", so all
three verbs failed with SpawnFailed on Windows — the last Windows gap.

These verbs are "subprocess-as-data": no TUI, output is ANSI-stripped by
`tail`. The correct Windows path is a plain (non-pseudoconsole) spawn, NOT a
ConPTY: a ConPTY echoes cooked stdin back onto stdout (doubling captured
input) and reports isatty(0)=true. A plain spawn yields isatty(0)=false (real
headless stdin), a genuine pipe EOF on close, and no stdin→stdout echo.

Add `spawn_headless_windows`, the Windows analog of `spawn_with_custom_stdin`,
dispatched from `PtyChild::spawn` for Pipe/Closed BEFORE any ConPTY is
allocated (so no conhost is spawned for a capture):

  * stdin: for Pipe an anonymous pipe (child gets the read end, we retain the
    write end as `stdin_pipe`); for Closed the NUL device via `Stdio::null()`.
  * stdout+stderr: ONE anonymous pipe whose write end is handed to both fds,
    so bytes interleave in one ordered stream (the Unix slave_out/slave_err
    dup analog); the read end feeds the existing `pty_reader_loop`.
  * every pipe end is created NON-inheritable (`anon_pipe` via CreatePipe with
    null security attributes); std::process makes inheritable duplicates of
    only the specific stdio handles under its own inheritance lock, so the
    ends we retain never leak into the child — dropping our stdin write end
    EOFs the child, exactly like a CLOEXEC pipe on Unix.

The resulting PtyChild reuses the inert `ClosedMaster` (headless panes are not
resized) and an `io::sink` writer (input flows via `stdin_pipe`). A
`WindowsChildShim` wraps `std::process::Child` for portable-pty's Child +
ChildKiller (plus the Windows-only `Child::as_raw_handle`); exit status maps
through `ExitStatus::from`, so `try_exit_code`/`shell_exit_code` report the
faithful code (`run` returns e.g. 7 for `cmd /c exit 7`). Tree-kill still goes
through `kill_tree_windows(pid)`. `write_stdin_pipe`/`close_stdin_pipe` were
already platform-agnostic and need no change.

Adds windows-sys features Win32_System_Pipes + Win32_Security (the latter gates
CreatePipe's SECURITY_ATTRIBUTES param). Unix paths are byte-identical
(env_overrides is merely constructed a few lines earlier); all new unsafe is
`#[allow(unsafe_code)]`.

Verified on Windows: run (no stdin) → clean stdout, no command echo; run
--stdin findstr → needle-42 exactly once (no echo doubling); run --stdin python
→ isatty(0)=False (real pipe stdin) and EOF-driven exit; edit → returns file
content; exit-code fidelity (exit 7 → 7). Build + workspace --all-targets + a
clippy pass all clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ignal, run-pipe handle leak

Four audit-driven Windows fixes (Unix paths kept byte-identical via cfg gating).

1. pane::exit_drained — measure the drain grace from when the child's exit was
   first observed, and make the grace Windows-only. The old gate used
   last_output_ms_ago, whose None ("no output yet") also matches "reader hasn't
   pushed its first chunk", so a fast child that wrote-then-exited could be
   reported drained before the reader read those bytes → dropped final
   output/tail frame. New: record exit-observed Instant (new cfg(windows)
   `exit_observed_at` field, set inside exit_drained); Windows gate is
   `poll_exit && (reader_finished || exit_observed_at.elapsed() >= DRAIN_GRACE)`;
   Unix gate is exactly the historical `poll_exit && reader_finished` (no grace),
   so a grandchild holding the slave PTY open keeps follow alive until real
   close. DRAIN_GRACE stays a named 150ms const.

2. server::parent_pid_monitor (Windows arm) — the blocking
   `WaitForSingleObject(handle, 500)` loop never awaited, pinning a tokio worker
   forever; on any non-parent-death shutdown (idle timeout / `daemon shutdown`)
   the runtime-drop blocking-pool join hung → permanent zombie daemon. Rewrote
   as a select! poll: yielding `sleep(500ms)` + non-blocking
   `WaitForSingleObject(handle, 0)`, also listening on the shutdown Notify for
   prompt cancellation. Fires the same `shutdown.notify_waiters()` as the Unix
   arm, CloseHandle on every return path, null-OpenProcess still shuts down
   immediately. Handle held as usize across the await to keep the future Send.
   Unix arm untouched.

3. signal::deliver_ctrl_event (Windows) — a headless run/ask/edit pane has
   writer = io::sink(), so writing 0x03/0x1c "succeeded" while discarding the
   byte → `signal SIGINT` returned success delivering nothing. Now detects a
   non-Pty stdin_mode and returns an honest InvalidArgs DeliverErr; interactive
   Pty panes keep the ConPTY-input write.

Investigate (5) — `agent-tui run ... | consumer` hanging on Windows: FIXED.
Root cause: run's lazily-spawned DETACHED daemon inherited the run client's
stdout pipe write-end. Rust std Command::spawn calls CreateProcess with
bInheritHandles=TRUE, so the long-lived daemon (idle-timeout backstop) inherited
every inheritable handle including the client's stdout pipe, and the pipe never
EOF'd → consumer blocked (file redirection is unaffected, matching the symptom;
run intentionally does not self-shut-down to avoid racing back-to-back runs).
Fix: client strips HANDLE_FLAG_INHERIT from its std handles before the detached
spawn, via new `agent_tui_daemon::deny_std_handle_inheritance()` (the unsafe
SetHandleInformation lives in the daemon crate since the CLI is
forbid(unsafe_code)). Verified: piped `run -- cmd /c echo PIPED | Select-String`
returns in <1s instead of hanging.

Verification (Windows, debug): cargo build -p agent-tui and
cargo build --workspace --all-targets both exit 0, no warnings. Smoke:
(1) events + tail --follow on `cmd /c ver` end with child_exited and carry the
version text (output not dropped); events on `ping -n 3` ends. (2) daemon with a
live --monitor-parent shuts down in ~0.2s after `daemon shutdown` (no hang).
(3) headless pane signal → success:false INVALID_ARGS "headless"; interactive
pane signal → success:true. (5) piped run completes promptly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nestly

Two Windows-only correctness fixes on the ConPTY/daemon path.

1. run-pipe hang. `agent-tui run ... | consumer` could hang when the
   per-session daemon was lazily spawned from inside a cmd.exe pipeline.
   cmd creates the pipe with inheritable handles, so the client process
   holds an extra inheritable pipe handle beyond its own stdout; the
   long-lived daemon then inherited and pinned that handle open and the
   consumer never observed EOF until the daemon's idle timeout. Stripping
   HANDLE_FLAG_INHERIT from the client's own std handles was insufficient.
   Replace that shim with win_spawn::spawn_detached_no_inherit, which
   spawns the detached daemon via CreateProcessW + STARTUPINFOEX +
   PROC_THREAD_ATTRIBUTE_HANDLE_LIST so it inherits only three fresh NUL
   std handles -- the documented mechanism for precise handle inheritance.
   No new dependencies. The Unix spawn path is unchanged.

2. SIGBREAK honesty. `signal SIGBREAK`/`SIGQUIT` previously wrote FS (0x1c)
   to the ConPTY input and reported success, but conhost does not translate
   any input byte into a CTRL_BREAK_EVENT, so nothing was delivered. Reject
   it with an actionable error (use SIGINT to interrupt; SIGTERM/die to
   stop) rather than claim a success that never happens. Module/strategy
   docs updated to match.

Verified on a Windows msvc host: the pipe repro no longer hangs, and an
end-to-end suite (SIGINT delivery, tree-kill, events/tail/attach/watch
termination, --monitor-parent, shutdown reap, run exit-code fidelity)
passes 30/30.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Ersan Bilik <ersanbilik@gmail.com>
The branch did not pass `cargo clippy --workspace --all-targets -- -D warnings`
or `cargo fmt --all -- --check` (pedantic is enabled via workspace lints).
Mechanical, no behavioral changes:

- add clippy.toml with doc-valid-idents for the ConPTY / ConductorOne
  proper nouns
- inline format args, use map_or_else, `&raw mut`, and inclusive ranges
- backtick API-name doc references (ChildKiller, TerminateProcess)
- targeted #[allow] for similar_names / unused_async where the bindings
  are clear in context and the async stubs mirror their Unix signatures
- apply rustfmt

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Ersan Bilik <ersanbilik@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant