Elasticsearch sizing and queue capacity

The issue here is in how elasticsearch performs queries like you are executing. When using Dashboard, it acts differently than if you are using Discover. But, that is a separate question.

For queries like this, ES has to issue a query for each shard involved. If you have a new index per day, and 6 months of data, that is 180 days and 180 indices. if each index has 5 shards, that is 900 total shards. If you are using a dashboard that has many visualizations on it, you could have * 900 = total queries to execute.

How many can you execute? ES 2.x uses, I believe, the formula of 1.5 * CPUs * queue size. Say you have one node, 4 CPUs, then you can handle ( 1.5 * 4 * ) queries at once - I think the default queue size is 500 so that would be 3000 total. But, if you had 6 visualizations each needing 900 shards, that would be 5400 queries, which will overflow your search queue.

Obviously, if you increase your queue size, then you don't overflow your queue. The next question that comes to my mind is, how long does that take? If the search performance is good, then it may not hurt to have a larger tread pool search queue. Kibana will only wait 30 seconds by default, but that can be configured.

Since Elasticsearch is horizontally scalable, the most obvious alternative suggestion is to add another node. The cluster will distribute your shards evenly and this will give you 2x the CPUs, 2x the search queue size and 2x the search performance. With replication, having a second node can result in data redundancy and fault tolerance as well.

I hope this helps.