I realize that this is late, but I recently discovered the cause of this issue. It is due to the fact that we configure the Low Level REST Client's internal connection manager to have a timeout of 500ms. If your bulk payloads take longer than that, then the Apache threadpool's threads can be overwhelmed (defaults to the number of cores on the machine, which I suspect this case had 4
or possibly 5
from a Docker container).
The underlying Apache HTTP client's timeout is -1
, which represents an unlimited time that a request can sit in the connection pool even attempting to be sent. As noted in the line above, the current default for the Low Level REST client (and thus also the High Level REST client) is 500
milliseconds. So if a request gets stuck behind N
other requests -- where N
depends on the connection pool's settings -- then it will end up being ignored when it comes time to even try to process it.
To get rid of this timeout, you can tell the connection manager to not have a timeout when configuring the RestClient
. This can be done like the other timeout configurations by setting
requestConfigBuilder.setConnectionRequestTimeout(-1)
The implication of this change is that you may end up waiting an unpredictable amount of time before any of your requests actually get fired from your process, which is probably fine for a bulk
request, but it may be less useful (and thus predictable) for any other types of requests unless your application is fine with the asynchronous nature of things.
Hope that helps someone,
Chris