Windows/ConPTY parity: rendering, signals, tree-kill, and the run|consumer hang#124
Open
bilersan wants to merge 8 commits into
Open
Windows/ConPTY parity: rendering, signals, tree-kill, and the run|consumer hang#124bilersan wants to merge 8 commits into
bilersan wants to merge 8 commits into
Conversation
Two Windows-specific fixes to the daemon PTY layer. FIX 1 — Route engine PTY write-back (DSR/DA replies) to ConPTY. alacritty_terminal 0.26 emits terminal->host replies (DSR cursor reports, DA1/DA2, kitty-keyboard) via Event::PtyWrite during parser.advance. The engine dropped them, so ConPTY's startup ESC[6n cursor-position query was never answered and the child stalled with a blank screen. Add a default `Engine::take_pty_writes` trait method; the alacritty engine accumulates PtyWrite bytes in TermEvents and drains them after each feed. The daemon reader loop drains these under capture_lock and writes them back to the PTY master after releasing the lock (locking order: capture_lock -> feed -> take_pty_writes -> release -> writer). The writer is now shared into the reader loop via Arc<Mutex<..>>. The handshake bytes are NOT teed to the recorder (protocol, not agent input; would desync .cast replay). FIX 2 — Windows child-kill (die returned killed:false, orphaned tree). portable-pty 0.9's WinChildKiller::kill() inverts the TerminateProcess success BOOL (returns Err on success) and only kills the direct child, orphaning the ConPTY host and grandchildren. On Windows, bypass the broken killer and force-kill the whole descendant tree by PID with `taskkill /F /T` (the Windows analog of the Unix killpg teardown). Unix paths are untouched. No new dependencies. This also fixes signal.rs deliver_terminate via the single kill() change. Verified on Windows 11: snapshot text now renders the cmd prompt and echoed input; `die` returns killed:true / kill_error:null and reaps the child cmd.exe; the headless conhost is reaped on daemon teardown. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Reader loop: drain engine.take_pty_writes() unconditionally after a successful feed, decoupled from the output_buf lock, so a poisoned ring lock can't strand the DSR reply and wedge the child on its startup ESC[6n. - kill(): return Err (not Ok) when child_pid() is None, so die/signal don't report a false killed:true when the child-handle mutex is poisoned. - kill_tree_windows: resolve taskkill via %SystemRoot%\System32 absolute path (fallback to bare name) so a sanitized/relocated PATH can't break teardown. - Fix stale EventProxy doc comment that still listed PtyWrite as dropped. Verified: clean build, snapshot renders cmd + powershell prompts, die returns killed:true, no orphaned conhost/cmd. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
G4 signal: deliver SIGINT/SIGQUIT/SIGBREAK to the ConPTY child by writing
the control byte (ETX 0x03 / FS 0x1c) to the pseudoconsole input via the
master writer, the idiomatic ConPTY approach. GenerateConsoleCtrlEvent
could never reach the child (separate pseudoconsole, not a process-group
root; portable-pty does not set CREATE_NEW_PROCESS_GROUP), so it always
failed and delivered nothing. Fix the false doc comments in signal.rs and
pty.rs child_pid().
G5 --monitor-parent: implement the Windows owner-death watch with
windows-sys (OpenProcess(SYNCHRONIZE|PROCESS_QUERY_LIMITED_INFORMATION) +
WaitForSingleObject(500ms) -> shutdown.notify_waiters()), mirroring the
Unix kill(pid,0) poll. Previously a silent no-op.
G6 reap_all_panes: implement the Windows arm; for each still-live pane
call kill_tree_windows(child_pid) (taskkill /F /T), the direct analog of
the Unix killpg loop. Previously a no-op -> orphaned ConPTY children and a
daemon that hung on exit.
G7 daemon upgrade: return an honest success:false error on Windows instead
of acking {status:upgrading} then silently bailing. In-place re-exec is
Unix-only.
G2-console: isolate the lazily-spawned daemon from the client console on
Windows via CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS so a client Ctrl-C
(console-wide CTRL_C_EVENT) no longer kills the daemon.
All Unix paths untouched (Windows arms gated with cfg(windows)/cfg(not(unix))).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rdown
Close the Windows streaming-termination gap: on ConPTY the daemon holds the
master for the pane's life, so conhost keeps the output pipe's write end open
and the blocking reader loop never sees EOF. As a result reader_finished()
never flips, the streaming verbs' terminal gate
(poll_exit().is_some() && reader_finished()) never fires, and the per-pane
spawn_blocking reader thread stays parked forever. Symptoms: `events`
(child_exited), `tail --follow`/`attach`/`watch` (eof) hang after the child
exits; the leaked reader thread hangs the daemon on shutdown (the runtime's
blocking-pool join never completes) and wedges the daemon after a killed
streaming client.
Part A — terminal predicate so streams END on all platforms
Add Pane::exit_drained(), a single shared helper used at BOTH streaming gates
(events + stream_follow, which backs tail --follow and attach) so they stay in
lockstep:
poll_exit().is_some()
&& (reader_finished() || last_output_ms_ago().map_or(true, |ms| ms > 150))
ConPTY emits nothing after the child exits, so once the ring goes quiet past
the ~150ms drain grace (a couple of STREAM_IDLE_TICKs) the terminal event
fires with all trailing bytes already flushed. On Unix reader_finished() flips
the instant the child exits, so the grace branch is never reached and behavior
is byte-for-byte unchanged.
Part B — release the parked reader thread on Windows
Ownership (portable-pty 0.9 win/conpty.rs + win/psuedocon.rs): try_clone_reader
DuplicateHandle-clones its own pipe read handle rather than holding the
pseudoconsole's Arc<Mutex<Inner>>; after spawn the master holds the sole Arc,
so dropping it runs PsuedoCon::drop -> ClosePseudoConsole, conhost closes the
write end, and the reader EOFs. Add a #[cfg(windows)] PtyChild::close_master()
that swaps an inert ClosedMaster into MasterEnd::Portable (keeping the enum type
unchanged so Unix paths are untouched) to drop the real master, and call it
during pane teardown: die's Windows teardown and reap_all_panes (for every
pane, live or terminal-retained). This unblocks the reader's spawn_blocking
thread (JoinHandle::abort cannot cancel it), so the daemon no longer hangs on
shutdown and the killed-client wedge is gone. Terminal-retained snapshots are
unaffected — the engine grid and output ring are separate Arcs and stay
readable after the master is closed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`run`/`ask`/`edit` decompose into a spawn with `StdinMode::Pipe` (input
present) or `StdinMode::Closed` (no input). On Unix these route through
`spawn_with_custom_stdin` (pipe/`/dev/null` on stdin, stdout/stderr on the
PTY slave). The `#[cfg(not(unix))]` arm just returned "requires Unix", so all
three verbs failed with SpawnFailed on Windows — the last Windows gap.
These verbs are "subprocess-as-data": no TUI, output is ANSI-stripped by
`tail`. The correct Windows path is a plain (non-pseudoconsole) spawn, NOT a
ConPTY: a ConPTY echoes cooked stdin back onto stdout (doubling captured
input) and reports isatty(0)=true. A plain spawn yields isatty(0)=false (real
headless stdin), a genuine pipe EOF on close, and no stdin→stdout echo.
Add `spawn_headless_windows`, the Windows analog of `spawn_with_custom_stdin`,
dispatched from `PtyChild::spawn` for Pipe/Closed BEFORE any ConPTY is
allocated (so no conhost is spawned for a capture):
* stdin: for Pipe an anonymous pipe (child gets the read end, we retain the
write end as `stdin_pipe`); for Closed the NUL device via `Stdio::null()`.
* stdout+stderr: ONE anonymous pipe whose write end is handed to both fds,
so bytes interleave in one ordered stream (the Unix slave_out/slave_err
dup analog); the read end feeds the existing `pty_reader_loop`.
* every pipe end is created NON-inheritable (`anon_pipe` via CreatePipe with
null security attributes); std::process makes inheritable duplicates of
only the specific stdio handles under its own inheritance lock, so the
ends we retain never leak into the child — dropping our stdin write end
EOFs the child, exactly like a CLOEXEC pipe on Unix.
The resulting PtyChild reuses the inert `ClosedMaster` (headless panes are not
resized) and an `io::sink` writer (input flows via `stdin_pipe`). A
`WindowsChildShim` wraps `std::process::Child` for portable-pty's Child +
ChildKiller (plus the Windows-only `Child::as_raw_handle`); exit status maps
through `ExitStatus::from`, so `try_exit_code`/`shell_exit_code` report the
faithful code (`run` returns e.g. 7 for `cmd /c exit 7`). Tree-kill still goes
through `kill_tree_windows(pid)`. `write_stdin_pipe`/`close_stdin_pipe` were
already platform-agnostic and need no change.
Adds windows-sys features Win32_System_Pipes + Win32_Security (the latter gates
CreatePipe's SECURITY_ATTRIBUTES param). Unix paths are byte-identical
(env_overrides is merely constructed a few lines earlier); all new unsafe is
`#[allow(unsafe_code)]`.
Verified on Windows: run (no stdin) → clean stdout, no command echo; run
--stdin findstr → needle-42 exactly once (no echo doubling); run --stdin python
→ isatty(0)=False (real pipe stdin) and EOF-driven exit; edit → returns file
content; exit-code fidelity (exit 7 → 7). Build + workspace --all-targets + a
clippy pass all clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ignal, run-pipe handle leak
Four audit-driven Windows fixes (Unix paths kept byte-identical via cfg gating).
1. pane::exit_drained — measure the drain grace from when the child's exit was
first observed, and make the grace Windows-only. The old gate used
last_output_ms_ago, whose None ("no output yet") also matches "reader hasn't
pushed its first chunk", so a fast child that wrote-then-exited could be
reported drained before the reader read those bytes → dropped final
output/tail frame. New: record exit-observed Instant (new cfg(windows)
`exit_observed_at` field, set inside exit_drained); Windows gate is
`poll_exit && (reader_finished || exit_observed_at.elapsed() >= DRAIN_GRACE)`;
Unix gate is exactly the historical `poll_exit && reader_finished` (no grace),
so a grandchild holding the slave PTY open keeps follow alive until real
close. DRAIN_GRACE stays a named 150ms const.
2. server::parent_pid_monitor (Windows arm) — the blocking
`WaitForSingleObject(handle, 500)` loop never awaited, pinning a tokio worker
forever; on any non-parent-death shutdown (idle timeout / `daemon shutdown`)
the runtime-drop blocking-pool join hung → permanent zombie daemon. Rewrote
as a select! poll: yielding `sleep(500ms)` + non-blocking
`WaitForSingleObject(handle, 0)`, also listening on the shutdown Notify for
prompt cancellation. Fires the same `shutdown.notify_waiters()` as the Unix
arm, CloseHandle on every return path, null-OpenProcess still shuts down
immediately. Handle held as usize across the await to keep the future Send.
Unix arm untouched.
3. signal::deliver_ctrl_event (Windows) — a headless run/ask/edit pane has
writer = io::sink(), so writing 0x03/0x1c "succeeded" while discarding the
byte → `signal SIGINT` returned success delivering nothing. Now detects a
non-Pty stdin_mode and returns an honest InvalidArgs DeliverErr; interactive
Pty panes keep the ConPTY-input write.
Investigate (5) — `agent-tui run ... | consumer` hanging on Windows: FIXED.
Root cause: run's lazily-spawned DETACHED daemon inherited the run client's
stdout pipe write-end. Rust std Command::spawn calls CreateProcess with
bInheritHandles=TRUE, so the long-lived daemon (idle-timeout backstop) inherited
every inheritable handle including the client's stdout pipe, and the pipe never
EOF'd → consumer blocked (file redirection is unaffected, matching the symptom;
run intentionally does not self-shut-down to avoid racing back-to-back runs).
Fix: client strips HANDLE_FLAG_INHERIT from its std handles before the detached
spawn, via new `agent_tui_daemon::deny_std_handle_inheritance()` (the unsafe
SetHandleInformation lives in the daemon crate since the CLI is
forbid(unsafe_code)). Verified: piped `run -- cmd /c echo PIPED | Select-String`
returns in <1s instead of hanging.
Verification (Windows, debug): cargo build -p agent-tui and
cargo build --workspace --all-targets both exit 0, no warnings. Smoke:
(1) events + tail --follow on `cmd /c ver` end with child_exited and carry the
version text (output not dropped); events on `ping -n 3` ends. (2) daemon with a
live --monitor-parent shuts down in ~0.2s after `daemon shutdown` (no hang).
(3) headless pane signal → success:false INVALID_ARGS "headless"; interactive
pane signal → success:true. (5) piped run completes promptly.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nestly Two Windows-only correctness fixes on the ConPTY/daemon path. 1. run-pipe hang. `agent-tui run ... | consumer` could hang when the per-session daemon was lazily spawned from inside a cmd.exe pipeline. cmd creates the pipe with inheritable handles, so the client process holds an extra inheritable pipe handle beyond its own stdout; the long-lived daemon then inherited and pinned that handle open and the consumer never observed EOF until the daemon's idle timeout. Stripping HANDLE_FLAG_INHERIT from the client's own std handles was insufficient. Replace that shim with win_spawn::spawn_detached_no_inherit, which spawns the detached daemon via CreateProcessW + STARTUPINFOEX + PROC_THREAD_ATTRIBUTE_HANDLE_LIST so it inherits only three fresh NUL std handles -- the documented mechanism for precise handle inheritance. No new dependencies. The Unix spawn path is unchanged. 2. SIGBREAK honesty. `signal SIGBREAK`/`SIGQUIT` previously wrote FS (0x1c) to the ConPTY input and reported success, but conhost does not translate any input byte into a CTRL_BREAK_EVENT, so nothing was delivered. Reject it with an actionable error (use SIGINT to interrupt; SIGTERM/die to stop) rather than claim a success that never happens. Module/strategy docs updated to match. Verified on a Windows msvc host: the pipe repro no longer hangs, and an end-to-end suite (SIGINT delivery, tree-kill, events/tail/attach/watch termination, --monitor-parent, shutdown reap, run exit-code fidelity) passes 30/30. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Ersan Bilik <ersanbilik@gmail.com>
The branch did not pass `cargo clippy --workspace --all-targets -- -D warnings` or `cargo fmt --all -- --check` (pedantic is enabled via workspace lints). Mechanical, no behavioral changes: - add clippy.toml with doc-valid-idents for the ConPTY / ConductorOne proper nouns - inline format args, use map_or_else, `&raw mut`, and inclusive ranges - backtick API-name doc references (ChildKiller, TerminateProcess) - targeted #[allow] for similar_names / unused_async where the bindings are clear in context and the async stubs mirror their Unix signatures - apply rustfmt Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Ersan Bilik <ersanbilik@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings
agent-tuito functional parity on Windows: the per-session daemon nowdrives programs over a ConPTY the way it drives a PTY on Linux/macOS. The IPC
layer (
interprocesslocal sockets → named pipes) and the cfg-gatedsignal/kill paths were already in place; this branch closes the remaining
runtime gaps and hardens the Windows spawn/teardown path.
What this changes
Rendering & lifecycle
ESC[6n) so interactive programs renderinstead of showing a blank screen.
events/tail --follow/attach/watchon a ConPTY pane.conhost keeps the pseudoconsole's output pipe open while the daemon holds the
master, so the reader never sees EOF; a drain-grace predicate plus an explicit
master close fix this without truncating the final frame.
--monitor-parentand graceful shutdown reap the pane's descendant tree andrelease the parked reader thread, so the daemon exits cleanly rather than
hanging on the runtime's blocking-pool join.
Signals & processes
signal SIGINTdelivers ETX (0x03) to the ConPTY input (conhost raises areal
CTRL_C_EVENT);SIGTERM/SIGKILLperform ataskkill /F /Tdescendant-tree kill.
signal SIGBREAK/SIGQUITis now rejected with an actionable errorinstead of writing a byte conhost never translates into a
CTRL_BREAK_EVENTand reporting a false success.
run/ask/editspawn over plain pipes (isatty(0) == false,no echo, faithful exit codes).
run | consumerno longer hangscmd.exepipeline inherited an extrainheritable pipe handle that
cmdleaves in the client process, soagent-tui run ... | consumercould hang until the daemon's idle timeout.The daemon is now spawned via
CreateProcessW+STARTUPINFOEX+PROC_THREAD_ATTRIBUTE_HANDLE_LISTwith an explicit inherit allow-list (threefresh
NULstd handles only) — the documented mechanism for precise handleinheritance. No new dependencies; the Unix spawn path is unchanged.
daemon upgradereturns an honest "unsupported on Windows" error (in-placere-exec is Unix-only) rather than pretending to succeed.
Not addressed (documented, out of scope)
blocks cross-user command injection (other non-admin users get READ only, and
every dangerous verb requires a WRITE) and off-box clients
(
PIPE_REJECT_REMOTE_CLIENTS). A tight DACL is still worthwhile formulti-user / RDP / CI hosts and is tracked separately.
the child's pseudoconsole, which the daemon does not.
Testing
(verified via
Win32_Process), streaming-verb termination, signal routing,die/--grace,runexit-code + stdin fidelity,--monitor-parent(daemon exits ~330 ms after parent death), shutdown reap, and the
run | consumerno-hang repro.cargo test --workspace,cargo clippy --workspace --all-targets -- -D warnings,and
cargo fmt --all -- --checkall pass locally.#[cfg(unix)]arms were reviewed but not compiledhere. The CI matrix (
ubuntu/macos/windows) covers the Linux/macOSbuild + test.
🤖 Generated with Claude Code