Skip to content

Server-wide ThreadPool#300

Draft
Eunovo wants to merge 6 commits into
bitcoin-core:masterfrom
Eunovo:2026-threadpool
Draft

Server-wide ThreadPool#300
Eunovo wants to merge 6 commits into
bitcoin-core:masterfrom
Eunovo:2026-threadpool

Conversation

@Eunovo

@Eunovo Eunovo commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

This PR adds a ThreadPool owned by the EventLoop, enabling any server connection to dispatch work to a shared pool. The ThreadPool is tested with callbacks and recursive IPC calls.

Motivation

  • Reduces boilerplate for non-libmultiprocess clients that don't need control over where work executes on the server (similar goal to Add makePool method on ThreadMap #283).
  • Provides a straightforward way to cap the server's total thread count.

Difference from #283
Unlike the approach in #283, external clients don't need to send a pool creation request — the pool is available server-wide from the start. This lays groundwork for a future where ThreadMap access is restricted to privileged connections only.


Draft while I finish testing and gauge interest in a server-wide ThreadPool.

This test will be useful for verifying that the ThreadPool threads,
can handle other requests while waiting for callback to return.

Also use the opportunity to refactor some of the tests to remove
the try-catch blocks.
@DrahtBot

DrahtBot commented Jun 30, 2026

Copy link
Copy Markdown

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #288 (Create support branch for CI scripts, documentation, and examples by ryanofsky)
  • #283 (Add makePool method on ThreadMap by rustaceanrob)
  • #274 (Add nonunix platform support by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

Eunovo added 5 commits June 30, 2026 22:09
ProxyServer<Thread> previously bundled two concerns: the Cap'n Proto
server capability (getName, reference-counted lifetime) and core thread
management (OS thread, Waiter, m_thread_ready promise chain, post<T>()).

Extract the thread management concern into a new WorkerThread struct.
ProxyServer<Thread> becomes a thin wrapper that owns a WorkerThread and
adds the capability layer on top. This deduplication prepares for the
ThreadPool implementation where WorkerThread objects are owned directly
without Cap'n Proto capability overhead.

Key design choices:
- WorkerThread uses EventLoop& not EventLoopRef: lifetime is bounded by
  owner, which always holds the appropriate reference to keep the loop alive.
- WorkerThread::post<T>(fn, on_done) uses a template OnDone parameter
  (defaulting to nullptr_t with if constexpr guard) to avoid requiring
  copy-constructibility, allowing callers to capture move-only values.
- ProxyServer<Thread> gains a second constructor wrapping an existing
  thread (ThreadContext&) for use as a callback-thread capability.
- makeThread() simplified: thread creation now inside WorkerThread ctor.
Add ThreadPool class holding a fixed number of WorkerThreads dispatched
round-robin. Wire it into EventLoop via a new num_pool_threads constructor
parameter (default 0, no pool). When num_pool_threads > 0, m_thread_pool
is created after m_task_set so WorkerThread destructors (which join their
threads) run before the task set is destroyed.

Pool threads are named '<exe>-N (pool I)' and are not Cap'n Proto
capabilities - clients never hold Thread::Client references to them.
PassField dispatch integration follows in the next commit.
When a Context argument has no thread set, dispatch the request to the
EventLoop's thread pool (if configured) instead of failing. Also adds a
num_pool_threads parameter to TestSetup and a test verifying pool
lifecycle and direct work dispatch.
@ryanofsky

Copy link
Copy Markdown
Collaborator

Thanks for the PR! Seems like there are a lot of nice things here: New recursive async test looks worthwhile (though I think it might be possible to simplify by using passFn/std::function instead of callback/Callback). Test cleanups and WorkerThread refactor also seem helpful and make code more readable.

Things I'd change:

  • I think thread pools should be per-connection instead of per-eventloop. If there are only a limited number of threads the threads can easily get backed up or deadlocked with only a small number of blocking calls. If there is a shared thread pool then requests from one connection can block requests from a unrelated connections, which seems like it could cause unwanted interactions between clients and difficult to debug problems.

  • Relatedly I think clients should choose number of threads in their thread pools rather than leaving it to the server to decide. Clients need to be trusted anyway right now and only clients can have a good idea of how many threads they may need. I don't think the server or server administrator could do much better than blind guessing. That's not to say the server couldn't have caps on number of threads clients create, but either way clients should be explicit about their needs and expectations.

  • Similar to #283, scheduling algorithm does not seem very good because it just assigns tasks to threads whether the threads are blocked or not. A better approach would be to use a shared queue that threads pull from when they are idle as implemented in bd1f80c (branch) from #283 (comment)

  • I'm suspicious of the "getLocalServer call failed" error message and comment changing in a8e5413. The goal of that error message is not to be triggered when context_arg.getThread() capability is null. (A better way to check whether the capability is null is to call context_arg.hasThread() instead.) The goal of the error message is to be triggered when context_arg.getThread() is an invalid capability promise passed from the client to the server using capnproto's request pipelining feature. An invalid capability promise can be passed if the client calls ThreadMap.makeThread on a null ThreadMap object, and the the call is pipelined.

Overall, I think the solution for providing more automatic threading will combine different aspects of all this approach, bd1f80c, and #283. I'm planning on merging #283 first in its current form (and bumping the libmultiprocess subtree in bitcoin core) and this PR could be a good followup to that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants