Fix Apache HttpClient 5 reactor recovery#2034
Conversation
27d4ae0 to
3004731
Compare
| * Releases the (potentially very large) response buffer and converts a fatal {@link OutOfMemoryError} into a | ||
| * recoverable {@link IOException}. |
There was a problem hiding this comment.
I'm not sure it is safe to do this. The contract of java.lang.Error generally is that it "indicates serious problems that a reasonable application should not try to catch". I don't think you can assume the JVM is in a workable state after observing an error and the right thing to do is let the process crash. In this case, you don't actually know if the response buffer in question here caused the OOM, versus something else in the application unrelated to this client that consumed all the heap.
|
Would it make sense to break this into 2 PRs? One with the fix to allow recovery from reactor shutdowns, and another PR with the changes to reduce the chance of reactor-killing memory failures. |
Signed-off-by: Cédric Pelvet <cedric.pelvet@gmail.com>
Signed-off-by: Cédric Pelvet <cedric.pelvet@gmail.com>
Signed-off-by: Cédric Pelvet <cedric.pelvet@gmail.com>
3004731 to
a6d50d0
Compare
andrross
left a comment
There was a problem hiding this comment.
Looks good to me. @reta can you take a look?
@sharp-pixel The PR description should be updated since this no longer has all the original changes.
| if (isRecoverableReactorFailure(runtimeFailure, client) == false) { | ||
| throw runtimeFailure; | ||
| } | ||
| // The I/O reactor has been shut down. Try to recover by rebuilding the client, then retry this request once | ||
| // on the fresh client. | ||
| retryAfterReactorFailure(nodeTuple, options, request, warningsHandler, listener, node, client, runtimeFailure, allowRecovery); | ||
| return; |
There was a problem hiding this comment.
Nitpick, but easier to follow if you invert the logic:
| if (isRecoverableReactorFailure(runtimeFailure, client) == false) { | |
| throw runtimeFailure; | |
| } | |
| // The I/O reactor has been shut down. Try to recover by rebuilding the client, then retry this request once | |
| // on the fresh client. | |
| retryAfterReactorFailure(nodeTuple, options, request, warningsHandler, listener, node, client, runtimeFailure, allowRecovery); | |
| return; | |
| if (isRecoverableReactorFailure(runtimeFailure, client)) { | |
| // The I/O reactor has been shut down. Try to recover by rebuilding the client, then retry this request once | |
| // on the fresh client. | |
| retryAfterReactorFailure(nodeTuple, options, request, warningsHandler, listener, node, client, runtimeFailure, allowRecovery); | |
| } else { | |
| throw runtimeFailure; | |
| } |
Yeah I will take a look in a few days, thanks @andrross and @sharp-pixel |
Fixes #1969
Problem
ApacheHttpClient5Transportcould enter an unrecoverable state after Apache HttpClient 5 reportedIOReactorShutdownException. Once the underlying I/O reactor was shut down, the transport continued using the same deadCloseableHttpAsyncClient, so every later request failed until the application recreated the transport.One common trigger is memory pressure while buffering large or highly concurrent responses. The existing per-response buffer limit capped a single response, but it did not cap aggregate heap usage across concurrent responses, so many in-flight responses could still exhaust the heap and kill the reactor.
Solution
This change makes
ApacheHttpClient5Transportrecover from reactor shutdowns when it owns the client lifecycle:IOReactorShutdownExceptionis detected.execute(...)failures and callback-delivered reactor shutdown failures.It also reduces the chance of reactor-killing memory failures:
OutOfMemoryErrorduring response buffering into a request-levelIOExceptionso the reactor thread is not killed.Testing
Ran:
Also verified the locally published modified client in a standalone sample app using the normal existing transport builder path against OpenSearch 3.7.0 in Docker: