Fix Apache HttpClient 5 reactor recovery by sharp-pixel · Pull Request #2034 · opensearch-project/opensearch-java

sharp-pixel · 2026-06-30T21:17:58Z

Fixes #1969

Problem

ApacheHttpClient5Transport could enter an unrecoverable state after Apache HttpClient 5 reported IOReactorShutdownException. Once the underlying I/O reactor was shut down, the transport continued using the same dead CloseableHttpAsyncClient, so every later request failed until the application recreated the transport.

One common trigger is memory pressure while buffering large or highly concurrent responses. The existing per-response buffer limit capped a single response, but it did not cap aggregate heap usage across concurrent responses, so many in-flight responses could still exhaust the heap and kill the reactor.

Solution

This change makes ApacheHttpClient5Transport recover from reactor shutdowns when it owns the client lifecycle:

Rebuilds and starts a fresh Apache HC5 async client when IOReactorShutdownException is detected.
Handles both synchronous execute(...) failures and callback-delivered reactor shutdown failures.
Retries the affected request once on the rebuilt client.
Adds rebuild backoff to avoid rebuild storms if the reactor keeps dying.
Prevents recovery after the transport has been explicitly closed.
Keeps externally supplied clients unchanged; recovery is disabled when the transport does not own the client lifecycle.

It also reduces the chance of reactor-killing memory failures:

Adds an optional aggregate response-buffer memory budget.
Adds bounded response-consumer capacity increments when a budget is active.
Converts OutOfMemoryError during response buffering into a request-level IOException so the reactor thread is not killed.
Fails fast without node denylisting/retry for deterministic client-side failures such as shared-budget exhaustion, oversized single responses, and OOM-caused buffering failures.
Resets leaked response-buffer budget reservations when the client is rebuilt, including request-specific budgets.

Testing

Ran:

./gradlew :java-client:test \
  --tests org.opensearch.client.transport.httpclient5.ApacheHttpClient5TransportRecoveryTest \
  --tests org.opensearch.client.transport.httpclient5.internal.ResponseMemoryBudgetTest \
  --tests org.opensearch.client.transport.httpclient5.internal.HeapBufferedAsyncEntityConsumerTest

./gradlew :java-client:test \
  --tests 'org.opensearch.client.transport.httpclient5.*' \
  --tests 'org.opensearch.client.transport.httpclient5.internal.*'

./gradlew :java-client:spotlessJavaCheck

git diff --check

Also verified the locally published modified client in a standalone sample app using the normal existing transport builder path against OpenSearch 3.7.0 in Docker:

ping and info/version
create index
index and search a document
concurrent indexing/search smoke test with one client instance

andrross · 2026-06-30T21:40:09Z

+     * Releases the (potentially very large) response buffer and converts a fatal {@link OutOfMemoryError} into a
+     * recoverable {@link IOException}.


I'm not sure it is safe to do this. The contract of java.lang.Error generally is that it "indicates serious problems that a reasonable application should not try to catch". I don't think you can assume the JVM is in a workable state after observing an error and the right thing to do is let the process crash. In this case, you don't actually know if the response buffer in question here caused the OOM, versus something else in the application unrelated to this client that consumed all the heap.

andrross · 2026-06-30T21:42:06Z

Would it make sense to break this into 2 PRs? One with the fix to allow recovery from reactor shutdowns, and another PR with the changes to reduce the chance of reactor-killing memory failures.

Signed-off-by: Cédric Pelvet <cedric.pelvet@gmail.com>

andrross

Looks good to me. @reta can you take a look?

@sharp-pixel The PR description should be updated since this no longer has all the original changes.

andrross · 2026-07-02T14:57:26Z

+            if (isRecoverableReactorFailure(runtimeFailure, client) == false) {
+                throw runtimeFailure;
+            }
+            // The I/O reactor has been shut down. Try to recover by rebuilding the client, then retry this request once
+            // on the fresh client.
+            retryAfterReactorFailure(nodeTuple, options, request, warningsHandler, listener, node, client, runtimeFailure, allowRecovery);
+            return;


Nitpick, but easier to follow if you invert the logic:

Suggested change

if (isRecoverableReactorFailure(runtimeFailure, client) == false) {

throw runtimeFailure;

}

// The I/O reactor has been shut down. Try to recover by rebuilding the client, then retry this request once

// on the fresh client.

retryAfterReactorFailure(nodeTuple, options, request, warningsHandler, listener, node, client, runtimeFailure, allowRecovery);

return;

if (isRecoverableReactorFailure(runtimeFailure, client)) {

// The I/O reactor has been shut down. Try to recover by rebuilding the client, then retry this request once

// on the fresh client.

retryAfterReactorFailure(nodeTuple, options, request, warningsHandler, listener, node, client, runtimeFailure, allowRecovery);

} else {

throw runtimeFailure;

}

reta · 2026-07-02T20:42:07Z

Looks good to me. @reta can you take a look?

Yeah I will take a look in a few days, thanks @andrross and @sharp-pixel

sharp-pixel requested review from Bukhtawar, VachaShah, Xtansia, madhusudhankonda, reta, saratvemulapalli and szczepanczykd as code owners June 30, 2026 21:17

sharp-pixel force-pushed the fix/1969-httpclient5-reactor-recovery branch from 27d4ae0 to 3004731 Compare June 30, 2026 21:30

andrross reviewed Jun 30, 2026

View reviewed changes

sharp-pixel added 3 commits July 1, 2026 00:57

Fix Apache HttpClient 5 reactor recovery

3a050be

Signed-off-by: Cédric Pelvet <cedric.pelvet@gmail.com>

Address HttpClient5 recovery review findings

70e43ea

Signed-off-by: Cédric Pelvet <cedric.pelvet@gmail.com>

Avoid extra callback coordination allocation

a6d50d0

Signed-off-by: Cédric Pelvet <cedric.pelvet@gmail.com>

sharp-pixel force-pushed the fix/1969-httpclient5-reactor-recovery branch from 3004731 to a6d50d0 Compare July 1, 2026 08:38

andrross approved these changes Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Apache HttpClient 5 reactor recovery#2034

Fix Apache HttpClient 5 reactor recovery#2034
sharp-pixel wants to merge 3 commits into
opensearch-project:mainfrom
sharp-pixel:fix/1969-httpclient5-reactor-recovery

sharp-pixel commented Jun 30, 2026

Uh oh!

andrross Jun 30, 2026

Uh oh!

andrross commented Jun 30, 2026

Uh oh!

andrross left a comment

Uh oh!

andrross Jul 2, 2026

Uh oh!

reta commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		* Releases the (potentially very large) response buffer and converts a fatal {@link OutOfMemoryError} into a
		* recoverable {@link IOException}.

Uh oh!

Conversation

sharp-pixel commented Jun 30, 2026

Problem

Solution

Testing

Uh oh!

andrross Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

andrross commented Jun 30, 2026

Uh oh!

andrross left a comment

Choose a reason for hiding this comment

Uh oh!

andrross Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

reta commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants