Understanding scaling for a read heavy cluster


I'm trying to understand how to scale an Elasticsearch cluster which is extremely slow right now!
Can anyone help?

It's approx. 124,000,000 documents totalling about 60GB.
I index the data once per month. In terms of usage, there are about 10-15 requests per second using a boolean query.

I've done my mappings, only index what needs searching and not use keywords for full-text only fields etc...

Hardware-wise, it's running across two nodes - each 8GB ram & 2 vCPUs. It's on an Elastic Cloud i/o cluster.

So here's the thing.. queries run much quicker on 1 shard (1 pri and 1 rep) than it does on say 3 or 5 shards, even though that 1 shard is 60GB. Can anyone tell me why?

Also I've noticed that CPU usage is through the roof but memory usage is relatively low..
Is the as a result of boolean queries?

Any help would be greatly appreciated!


Size your shards | Elasticsearch Guide [7.13] | Elastic is a good starting point for the sharding topic — let us know if you have more questions after reading it.

Also with the Profile API | Elasticsearch Guide [7.13] | Elastic you could see where you are spending your time (has a UI in Kibana's Dev Tools). Is it fast enough with a single shard or do we need to dig deeper?

@xeraa , thanks for the reply!

I think this might be key:

Searches run on a single thread per shard
Most searches hit multiple shards. Each shard runs the search on a single CPU thread. While a shard can run multiple concurrent searches, searches across a large number of shards can deplete a node’s search thread pool. This can result in low throughput and slow search speeds.

I * think * what's happening is the searches are piling up before search requests complete. Without load, on Elastic Cloud i/o machines (2vCPUs) a multisearch of 5-7 items was taking 2000-3000 milliseconds. On the compute optimized (4vCPUs) I can get that down to 200-300ms.
Does that sound right? Or wildly wrong?

(I have also done a few things to help with timings like index:false for non-essential items and stripping out ~30million records.)

I am going to load test again and see what that brings.


Not sure this is really "a large number of shards", but it might just be a better setup for your hardware, documents, queries, and shard size / count. Also you're saving the the coordination overhead (to combine the subresults of the individual shards).

And if you're CPU-bound, more CPU definitely sounds like a good approach. If you're not actively writing to the index a forcemerge might also be helpful. After that it might be possible to find a more performant way to write your query, but that will totally depend on your current queries.

Load testing is the best way to know for sure.