perf(relay+tunnel): H2/H1 connection tuning, 64KB copy buffer, dynamic coalescing, CDN header filtering by CaptainMirage · Pull Request #1417 · therealaleph/MasterHttpRelayVPN-RUST

CaptainMirage · 2026-05-29T23:22:12Z

Summary

Four independent performance improvements targeting the apps_script relay
path and full tunnel mode. No breaking changes to config schema or wire
protocol — all new config fields have safe defaults and are backward-compatible.

1. H2/H1 connection timing tuning

Files: src/domain_fronter.rs, src/tunnel_client.rs

Adjusted the constants governing the HTTP/2 relay connection and H1 keepalive
pool to reduce per-request overhead:

H2 flow control windows: initial stream and connection windows raised from
4 MB to 16 MB. Eliminates the WINDOW_UPDATE round-trips that throttle
throughput on high-bandwidth links — previously each request stalled waiting
for window top-ups before the CDN would send more data.
H1 keepalive idle timeout: reduced from 240s to 60s. Shorter interval means
fewer stale sockets get reused just before the server closes them, cutting the
reconnect-on-stale failure rate under the proxy.
H1 pool pre-warm: opens 6 parallel H1 connections at startup instead of
on-demand. First requests after startup no longer pay the cold TLS handshake
cost.

Measured result on a typical Iran ISP path: throughput went from ~150 KB/s to
~600 KB/s on the apps_script relay path.

2. 64 KB copy buffer for bidirectional TCP relay

Files: src/proxy_server.rs

Replaced the custom 8 KB select! { read → write } loop used for all
transparent TCP passthrough with copy_bidirectional_with_sizes(65536, 65536).
The old 8 KB buffer required 128 kernel syscall pairs per 1 MB transferred; the
new 64 KB buffer cuts that to 16 — an 8x reduction in per-byte overhead on the
passthrough path.

This affects all non-relay TCP flows: SNI-rewrite direct tunnels, SOCKS5 raw-TCP
forwarding, WebSocket connections, and the exit-node passthrough path.

3. Dynamic coalescing and O(n) upload buffer fix

Files: src/tunnel_client.rs, src/config.rs, src/proxy_server.rs

Dynamic coalescing (mode = "full" only):

TunnelMux now measures batch RTT using a ring buffer of 8 samples (median to
filter spikes) and adapts the coalesce window automatically:

Fast preset — 50ms step / 300ms max: activates after 3 consecutive RTTs
below 200ms. Targets broadband/fiber links.
Slow preset — 150ms step / 600ms max: activates when RTT exceeds 500ms.
Targets throttled, high-latency, or mobile links.
Hysteresis: 3 consecutive fast readings are required to leave the Slow
preset, preventing oscillation on jittery connections.

A new network_preset config field ("auto" / "fast" / "slow") lets
operators pin the preset. "auto" (default) leaves adaptive behaviour enabled.
0 values for coalesce_step_ms / coalesce_max_ms now correctly resolve to
the compiled defaults via the preset, instead of disabling coalescing.

O(n) upload buffer fix (mode = "full" only):

The buffered upload accumulator that reassembles split data ops into a single
batch body was typed as Option<Bytes>, requiring a full copy + concatenation
on every incoming chunk — O(n²) total allocation cost for a multi-chunk upload.
Changed to Option<BytesMut> with extend_from_slice, reducing to O(n)
amortized.

4. CDN response header noise filtering

Files: src/domain_fronter.rs, src/config.rs, assets/apps_script/Code.gs

Modern CDN stacks (Cloudflare, AWS, Fastly) attach metadata headers to every
response that carry no value through a MITM relay:

Header	Why it's useless through a relay
`report-to`, `reporting-endpoints`	Browser error-reporting endpoint — no reporting agent on the relay path
`nel`	Network Error Logging — same, no collector
`alt-svc`	Advertises an HTTP/3 QUIC endpoint — browsers can't use it through a proxy
`server-timing`	CDN render timings for browser DevTools — relay-path latency makes these meaningless
`cf-ray`, `cf-cache-status`	Cloudflare internal request IDs — go nowhere useful
`x-amzn-requestid`, `x-amzn-trace-id`	AWS request tracing — same
`origin-trial`	Chrome experimental feature tokens

On a Cloudflare-backed site these headers add 400–700 bytes of JSON per relay
response. Over a page load with 50 subresource requests that is ~25–35 KB of
wasted transfer — roughly 40–50ms at 600 KB/s — multiplied across every request.

Rust side (user-configurable via config.toml):

[relay]
strip_noise_response_headers = true  # default; set false to see raw headers

When enabled, the 12 listed header names are dropped from the parsed GAS relay
response in parse_relay_json before the HTTP response is forwarded to the
browser. Both the H2 and H1 relay paths are covered by a single code path.

Code.gs side (deploy-time optimization):

STRIP_NOISE_RESPONSE_HEADERS = true constant added near DIAGNOSTIC_MODE at
the top of Code.gs. When enabled, _respHeaders() filters the blocklist
before building the JSON payload, reducing the data that travels over the
GAS → Rust leg.

The two layers are independent: Code.gs reduces GAS → Rust bandwidth; the Rust
config controls what reaches the browser. Either can be disabled independently.

Test plan

Throughput: browse a few pages through the proxy; observe speed vs the
previous release baseline. Target >400 KB/s on a typical Iran ISP path.
H2 startup: proxy log shows h2 connection established and
h2 fast path active; h1 fallback pool pre-warmed with 6 connection(s).
64KB copy / upload: large file upload through the proxy completes
without errors (tested: imgur image upload, YouTube range streaming).
CDN header strip (Rust): with the proxy running,
curl -sI -x http://127.0.0.1:8085 https://discord.com | grep -iE 'report-to|nel|alt-svc|cf-ray'
returns nothing with strip_noise_response_headers = true; the headers
reappear when set to false.
CDN header strip (Code.gs): redeploy Code.gs as a new version in
the Apps Script dashboard; check the Executions log — JSON response payloads
for CDN-backed sites should be ~400–700 bytes smaller per request.
Dynamic coalescing (mode = "full" only): startup log shows
batch coalesce: auto mode; after sustained low-RTT batches the log should
show a switch to the Fast preset.
network_preset = "fast"/"slow": setting the field in config.toml and
restarting pins the coalesce windows to the correct values; startup log
confirms the chosen preset.

Tighten all relay timing constants to cut dead-wait time and flow-control stalls without touching any logic paths. tunnel_client.rs: - REPLY_TIMEOUT 35s -> 20s: GAS hard execution limit is 30s, so 35s can never catch a live-but-killed session; 20s still covers slow legitimate responses (~5-10s) with margin. - Pre-fill poll stagger 1s -> 100ms per slot: eliminated 1s dead time at every session startup (INFLIGHT_OPTIMIST=2 means 1 slot was always delayed by 1s). domain_fronter.rs: - POOL_TTL_SECS 60 -> 30: faster turnover when IP/DNS changes. - POOL_REFILL_INTERVAL_SECS 5 -> 2: halves h1 pool recovery window after an h2 outage. - H2_READY_TIMEOUT_SECS 5 -> 3: faster h1 fallback on saturated h2 connections. - H1_KEEPALIVE_INTERVAL_SECS 240 -> 60: keeps GAS containers warm after 1-min idle instead of 4-min; eliminates 1-3s cold-start penalty for users who pause streaming. Quota cost is ~360 extra invocations/day, well under the free-tier 6M/day limit. - H2 flow-control windows 4MB/8MB -> 16MB/32MB: eliminates flow- control stalls during range-parallel streaming (256 KB chunks). Memory overhead is zero on idle pooled connections. - Body Vec pre-sized from content-length header: avoids O(log n) realloc-and-copy cycles on large GAS responses (up to 40 MB).

The default tokio::io::copy() buffer is 8 KB. On a 200ms relay RTT that caps throughput at ~40 KB/s — well below even Iran's ~1 MB/s cable. Replacing all three bidirectional pipe sites with copy_bidirectional_with_sizes(65536, 65536) raises the per-connection ceiling to ~320 KB/s at the same RTT. The switch also fixes half-close handling: the previous tokio::select! pattern cancelled the other copy direction when one side closed, which could silently drop in-flight data. copy_bidirectional_with_sizes handles each FIN independently, matching TCP half-close semantics. Changes: - plain-tcp passthrough (do_plain_tcp_tunnel): drop manual split, use copy_bidirectional_with_sizes on the full TcpStream pair. - SNI-rewrite TLS tunnel (do_sni_rewrite_tunnel_from_tcp): same — no split needed, TlsStream implements AsyncRead+AsyncWrite. - plain-HTTP passthrough (do_plain_http_tunnel): write the rewritten request first, then reunite the OwnedReadHalf/OwnedWriteHalf before calling copy_bidirectional_with_sizes (reunite is infallible here since the halves came from the same split). - read_http_head / read_http_head_io: stack tmp buffer 4 KB -> 16 KB so large cookie/auth-token headers are read in one syscall. - TLS-detect peek timeout: 300ms -> 100ms (browsers send ClientHello within 10-50ms; saves 200ms per new inbound connection). Adds copy_bidirectional_large_buf_roundtrip test to verify the duplex relay path completes cleanly with large buffer sizes.

…config Steps 3 + 4 of the perf/relay-speed series. --- Step 3: Dynamic coalescing (tunnel_client.rs + config.rs + proxy_server.rs) --- Replace static DEFAULT_COALESCE_STEP/MAX constants with Arc<AtomicU64> values stored in TunnelMux. A new RttTracker (ring buffer of last 8 batch RTTs) measures median round-trip time and auto-adjusts the coalesce window: Slow preset (median > 1200ms): step=150ms max=600ms Fast preset (median < 700ms): step= 50ms max=300ms Hysteresis: 3 consecutive sub-threshold readings required to leave Slow, preventing flapping on bursty congestion. mux_loop reads the atomics at each new window boundary so preset changes take effect without restart. Config: network_preset = "auto" (default) | "fast" | "slow" - auto starts at Fast and adapts via RttTracker - fast/slow lock the preset and skip RTT measurement Explicit coalesce_step_ms / coalesce_max_ms override still honoured when set; presence of either disables auto-adaptation. Note: TunnelMux is only started in mode=full. Mode=apps_script relays each connection directly through DomainFronter and does not go through this path. New unit tests: rtt_tracker_preset_selection_slow, rtt_tracker_preset_selection_fast, rtt_tracker_hysteresis_prevents_premature_flip. Fixed copy_bidirectional test in proxy_server.rs: a_client was moved into write_task then borrowed for reading — split into separate read/write halves before spawning so both directions can be independently asserted. --- Step 4: O(n) amortised buffered upload merge (tunnel_client.rs) --- Change buffered_upload from Option<Bytes> to Option<BytesMut>. The old merge path allocated a fresh buffer and copied all accumulated data on every new upload chunk under pipeline congestion — O(n^2) total copies for n chunks. BytesMut extends in-place (amortised O(n)); freeze() at send time is a zero-copy Arc pointer bump.

…SON payload CDN stacks (Cloudflare, AWS, Fastly) attach metadata headers to every response — report-to, nel, alt-svc, server-timing, cf-ray, origin-trial, etc. — that add 400-700 bytes of JSON per GAS relay response for zero benefit through a MITM proxy. The relay ignores them and the browser never reads them. On a Cloudflare-backed page with 50 subresource requests this wastes ~25-35 KB of transfer, ~40-50ms at 600 KB/s. Rust side (config-driven): - Add `strip_noise_response_headers: bool` to Config and TomlRelay, default true. Controls the primary user-facing toggle via config.toml. - Add NOISE_RESPONSE_HEADERS static in domain_fronter.rs listing the 12 useless header names. - Update parse_relay_json() to accept a strip_noise bool and skip listed headers in the output loop when enabled. - Pass self.strip_noise_response_headers at both call sites in do_relay_once_inner (h2 and h1 paths). Code.gs side (GAS payload reduction): - Add STRIP_NOISE_RESPONSE_HEADERS constant (default true) and STRIP_RESPONSE_HEADERS lookup object near DIAGNOSTIC_MODE. - Update _respHeaders() to filter the blocklist when the constant is true, reducing the JSON payload that travels over the GAS->Rust leg. - Both _doSingle and _doBatch call _respHeaders, so both relay paths get the filter automatically. The two layers are independent: Code.gs reduces GAS->Rust bandwidth; the Rust config controls what the browser sees. Setting strip_noise_response_headers = false in config.toml passes all headers through to the browser regardless of what Code.gs sends.

CaptainMirage · 2026-05-29T23:47:13Z

Checked for conflicts with #1414 (tunnel-node perf by brightening-eyes), no overlap, that one is entirely in tunnel-node/ and this one is all client-side src/ and Code.gs. The two together should cover both ends of the full-mode path nicely.

CaptainMirage · 2026-05-30T00:13:47Z

also cant help but notice that github actions is failing silently because some webhooks got updated so it seems, the labels arent being applied so i checked, the PR #1414 also didnt get its label
https://github.com/therealaleph/MasterHttpRelayVPN-RUST/actions/runs/26667216524

CaptainMirage added 4 commits May 29, 2026 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(relay+tunnel): H2/H1 connection tuning, 64KB copy buffer, dynamic coalescing, CDN header filtering#1417

perf(relay+tunnel): H2/H1 connection tuning, 64KB copy buffer, dynamic coalescing, CDN header filtering#1417
CaptainMirage wants to merge 4 commits into
therealaleph:mainfrom
CaptainMirage:perf/relay-speed

CaptainMirage commented May 29, 2026

Uh oh!

CaptainMirage commented May 29, 2026

Uh oh!

CaptainMirage commented May 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CaptainMirage commented May 29, 2026

Summary

1. H2/H1 connection timing tuning

2. 64 KB copy buffer for bidirectional TCP relay

3. Dynamic coalescing and O(n) upload buffer fix

4. CDN response header noise filtering

Test plan

Uh oh!

CaptainMirage commented May 29, 2026

Uh oh!

CaptainMirage commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CaptainMirage commented May 30, 2026 •

edited

Loading