fix(server): decouple asyncify thread pool from COOP_TASKRUN for old-kernel compat#3517
Open
mfyuce wants to merge 3 commits into
Open
fix(server): decouple asyncify thread pool from COOP_TASKRUN for old-kernel compat#3517mfyuce wants to merge 3 commits into
mfyuce wants to merge 3 commits into
Conversation
Shard executors hardcoded IORING_SETUP_COOP_TASKRUN + TASKRUN_FLAG, which require Linux >= 5.19. On 5.15 the shard io_uring setup fails with EINVAL even though the default-flag main runtime starts fine, so the server can't boot at all. Gate the flags behind IGGY_SHARD_RUNTIME_COOP_TASKRUN (default true = unchanged behavior); set it to false to run on 5.10..5.19 kernels at a small latency cost. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QBxbPbdKXzoMdvLBeugNBX
With COOP_TASKRUN off (old-kernel fallback), compio routes some ops (fs, JWT storage) through the asyncify thread pool; thread_pool_limit(0) then panics 'thread pool is needed but no worker thread is running' and the HTTP server task dies on shard 0. Gate thread_pool_limit(0) behind the same flag so the default worker pool stays when COOP_TASKRUN is off. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QBxbPbdKXzoMdvLBeugNBX
TCP, HTTP and WebSocket transports dispatch some ops through the asyncify thread pool even when COOP_TASKRUN is on, so thread_pool_limit(0) cannot be tied to the COOP_TASKRUN flag alone: enabling COOP_TASKRUN on a 6.8+ kernel still panics with "thread pool is needed" when those transports are active. Add a keep_worker_pool parameter to create_shard_executor. The asyncify pool is only dropped when COOP_TASKRUN is true AND the caller signals no TCP/HTTP/WS transport is active. Both server and server-ng derive the flag from their loaded config; the server-ng bootstrap runtime passes true because it runs before config is available. This lets operators set IGGY_SHARD_RUNTIME_COOP_TASKRUN=true on Linux 6.8+ even with TCP transports enabled, gaining the lower io_uring latency without the worker-pool panic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_018duZYBkbguQ2pn8RJ82PUw
|
Thanks for the PR. It is labeled Slash commands (own line, regular comment) move it around the queue:
See CONTRIBUTING.md for details. |
mfyuce
added a commit
to mfyuce/iggy
that referenced
this pull request
Jun 21, 2026
Mark COOP_TASKRUN PR apache#3517 as submitted; clear TOBEDECIDED.md. Both apache#3516 and apache#3517 are now S-waiting-on-review on apache/iggy. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01QBxbPbdKXzoMdvLBeugNBX
mfyuce
added a commit
to mfyuce/iggy
that referenced
this pull request
Jun 21, 2026
- AGENTS.md: 104→75 lines. Removed redundant repo structure (derivable by ls), collapsed principles to iggy-specific rules only, merged Jenkins/QW infra into Infra section, updated handover block. - TODO.md: replaced stale checked items with 4 open PRs (apache#3516 apache#3517 apache#3523 apache#3525) + QW 0.9 upgrade task. - DONE.md: added sessions 5-10 block (QW sink pipeline, collector cutover, InvalidOffset bug + fix). - quickwit_sink/src/lib.rs: cargo fmt reformatting only.
Contributor
|
I am not sure about it, I think we rather not allow our server to run on kernels < 6.8, rather than adding extra startup flags that to disable |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
IORING_SETUP_COOP_TASKRUNrequires Linux ≥ 5.19. On kernels 5.10–5.18 theshard io_uring setup fails with
EINVALeven though the main runtime startsfine, preventing server boot entirely. These three commits fix that in order:
Gate COOP_TASKRUN flags behind an env var (
IGGY_SHARD_RUNTIME_COOP_TASKRUN,default
true= unchanged). Set it tofalseto run on 5.10–5.19 kernelsat a small latency cost. (
feat(server))Keep asyncify worker pool when COOP_TASKRUN is off. With the flag off,
compio routes fs/JWT-storage ops through the asyncify thread pool.
thread_pool_limit(0)then panics "thread pool is needed but no workerthread is running" on shard 0. Gate the
thread_pool_limit(0)call behindthe same flag. (
fix(server))Decouple
thread_pool_limitfrom COOP_TASKRUN entirely. TCP, HTTP, andWebSocket transports dispatch some ops through the asyncify pool even when
COOP_TASKRUN=true, so tyingthread_pool_limit(0)to the flag stillpanics on 6.8+ kernels with those transports active. Add a
keep_worker_pool: boolparameter tocreate_shard_executor; the pool isonly dropped when
COOP_TASKRUN=trueand no TCP/HTTP/WS transport isenabled. Both
serverandserver-ngderivekeep_worker_poolfrom theirloaded config. (
fix(server))After these changes, operators can set
IGGY_SHARD_RUNTIME_COOP_TASKRUN=trueon Linux 6.8+ with TCP transport enabled and get lower io_uring latency without
the worker-pool panic. On ≤5.18 kernels they set it to
falseand the serverboots normally.
Files changed
core/server_common/src/executor.rs—create_shard_executor(keep_worker_pool: bool), flag-gatedCOOP_TASKRUN/TASKRUN_FLAGcore/server/src/main.rs— derivekeep_worker_poolfrom configcore/server-ng/src/main.rs,core/server-ng/src/bootstrap.rs— sameTest plan
cargo clippy -p server -p server-ng -- -D warningspassesIGGY_SHARD_RUNTIME_COOP_TASKRUN=false+ TCP transportIGGY_SHARD_RUNTIME_COOP_TASKRUN=true+ TCP transport (no worker-pool panic)cargo test -p server_commonpasses🤖 Generated with Claude Code
https://claude.ai/code/session_018duZYBkbguQ2pn8RJ82PUw