Elasticsearch Transport Client bottle neck with concurrent calls

We ran a performance test on my local environment with one node, one index and one shard. The number of TPS was around 60. We started low around 20 to 25 TPS and the 99th percentile was around 100ms which is what he expected. The moment we started getting close to 50 TPS, our requests started queuing, timing out (at http level) and failing. When we looked at the report, 50th percentile was around 20 seconds.

Created a github issue for this: https://github.com/elastic/elasticsearch/issues/21349#issuecomment-258581935 and was routed here

The problem is not the client threads being parked .. the problem is transport client being the bottle neck of not being able to handle many concurrent requests. VisualVM profiler tells us that almost all the time, the threads are waiting on the BaseSync.get() call.

Here's the gatling report.

The calls are going to one index and one shard (no replica). Everything is local as well.
As you can see that the initial calls returned back almost instantaneously, but over time requests started getting queued up and eventually when you monitor through a profiler, you can see all threads are waiting (blocked?) on Sync.get(). I hope this sheds any light. And if anyone can point me to any performance tests that were done using transport client, that'd be great as well.

Anyone else saw this issue or did any further performance tests? I would expect the transport client to be async in all possible ways but seems like it blocking on calls.

All help appreciated!

What does your code look like?
Are you calling get() or actionGet()?

Instead of execute()?

Calling get() do wait for the response. You can use a listener instead.

Actually I am using Observables. Here's the code sample I use (it's very close to the actual code, but not actual code)

private void multiSearch(MultiSearchRequestBuilder builder) {
        logger.debug("Executing multisearch query");
        long start = System.currentTimeMillis();
        Observable.defer(() -> Observable.from(builder.execute()))
            .map(multiSearchResponse -> {
                long end = System.currentTimeMillis();
                System.out.println("Total time took for query: " + Thread.currentThread().getName() + " " + (end - start));
                List<SearchResult> searchResults = new ArrayList<>();

                for (MultiSearchResponse.Item item : multiSearchResponse.getResponses()) {
                    SearchResult result = getSearchResultsFromSearchResponse(item.getResponse());
                    if (result.getHits().size() > 0) {
.subscribe(new Subscriber<List<SearchResult>>() {
                public void onCompleted() {
                    // when the stream is completed

                public void onError(Throwable throwable) {                    

                public void onNext(List<SearchResult> searchResults) {
                    try {
                    } catch (Exception e) {
                        throw new RuntimeException(e);

I ran the test again, this time for an hour. You can see that 50th percentile is great and than suddenly 75th percentile shoots up. Somewhere, when the load starts building, the transport client blocks. Because everythign else in the code is pretty much reactive based on the listenableFuture (which is converted to observable)

So I tested and tested over the weekend and turns out the problem is not the client but it's the highlighting ! Let me start a different discussion on it and get more inputs.