My scenario is as follows: I want to run a very large number of queries and I want to fully utilize my cluster (40 data nodes X 16 CPUs).
I am batching my queries (400 per batch) and sending them via _msearch, however it seems I'm getting throttled. The cluster CPUs hardly get utilized, and I simply cannot get past ~10 seconds for 400 queries, no matter how I play with the max_concurrent_searches and max_concurrent_shard_requests parameters. The took values per each query simply increase as I increase the concurrency, but the total time remains the same.
Any idea what I'm doing wrong here? I am using ES 6.5.3.
Thanks.