We ran a performance test on my local environment with one node, one index and one shard. The number of TPS was around 60. We started low around 20 to 25 TPS and the 99th percentile was around 100ms which is what he expected. The moment we started getting close to 50 TPS, our requests started queuing, timing out (at http level) and failing. When we looked at the report, 50th percentile was around 20 seconds.
Created a github issue for this: https://github.com/elastic/elasticsearch/issues/21349#issuecomment-258581935 and was routed here
The problem is not the client threads being parked .. the problem is transport client being the bottle neck of not being able to handle many concurrent requests. VisualVM profiler tells us that almost all the time, the threads are waiting on the
Here's the gatling report.
The calls are going to one index and one shard (no replica). Everything is local as well.
As you can see that the initial calls returned back almost instantaneously, but over time requests started getting queued up and eventually when you monitor through a profiler, you can see all threads are waiting (blocked?) on
Sync.get(). I hope this sheds any light. And if anyone can point me to any performance tests that were done using transport client, that'd be great as well.
Anyone else saw this issue or did any further performance tests? I would expect the transport client to be async in all possible ways but seems like it blocking on calls.
All help appreciated!