We did some further analysis of the issue as mentioned by @Tosh above and providing some more details around the issue below.
so what we are seeing is that there seems to be some sort of bottleneck when transport MultiSearch Requests count increases from our JAVA client. As the multiSearch Request count increases the response time starts to slow down.
For example, our single multi search API generally contains on an average of 12 queries per request. Individually, this multi search API request takes around 1 second to complete but as we increase the concurrent multiSearch requests the response times jumps to 3sec, 5 seconds and we have even seen it go upto 20 seconds.
Each query above needs to query around 290 shards (multiple indices) in the cluster.
We also see that search thread pool is not completely getting utilized and not reaching its maximum value which is 25 in our case as we are running 6 data node cluster with ES version 6.2.2. Each data node has 16 GB RAM and 16 CPU's.
Checking the load and CPU on servers indicates that servers are not even under heavy load at any time of the day and that relates to the fact that thread pool might be under utilized.
Are there any setting or properties that were introduced after ES 2.3 that we should be setting to better utilize the cluster and to avoid this bottleneck ?
We saw couple of parameter on search API's and multi search API's, max_concurrent_shard_requests and max_concurrent_searches that were introduced post 2.3.
Default value for max_concurrent_shard_requests in our case is "30" as we have 6 nodes and 5 shards per indices. We tried to increase this number 200 as well but did not see much improvement in the bottle neck. We used Jmeter to generate load via the JAVA client.
Any suggestions would be helpful.