Understanding scaling for a read heavy cluster


I'm trying to understand how to scale an Elasticsearch cluster which is extremely slow right now!
Can anyone help?

It's approx. 124,000,000 documents totalling about 60GB.
I index the data once per month. In terms of usage, there are about 10-15 requests per second using a boolean query.

I've done my mappings, only index what needs searching and not use keywords for full-text only fields etc...

Hardware-wise, it's running across two nodes - each 8GB ram & 2 vCPUs. It's on an Elastic Cloud i/o cluster.

So here's the thing.. queries run much quicker on 1 shard (1 pri and 1 rep) than it does on say 3 or 5 shards, even though that 1 shard is 60GB. Can anyone tell me why?

Also I've noticed that CPU usage is through the roof but memory usage is relatively low..
Is the as a result of boolean queries?

Any help would be greatly appreciated!


Size your shards | Elasticsearch Guide [7.13] | Elastic is a good starting point for the sharding topic — let us know if you have more questions after reading it.

Also with the Profile API | Elasticsearch Guide [7.13] | Elastic you could see where you are spending your time (has a UI in Kibana's Dev Tools). Is it fast enough with a single shard or do we need to dig deeper?

@xeraa , thanks for the reply!

I think this might be key:

Searches run on a single thread per shard
Most searches hit multiple shards. Each shard runs the search on a single CPU thread. While a shard can run multiple concurrent searches, searches across a large number of shards can deplete a node’s search thread pool. This can result in low throughput and slow search speeds.

I * think * what's happening is the searches are piling up before search requests complete. Without load, on Elastic Cloud i/o machines (2vCPUs) a multisearch of 5-7 items was taking 2000-3000 milliseconds. On the compute optimized (4vCPUs) I can get that down to 200-300ms.
Does that sound right? Or wildly wrong?

(I have also done a few things to help with timings like index:false for non-essential items and stripping out ~30million records.)

I am going to load test again and see what that brings.


Not sure this is really "a large number of shards", but it might just be a better setup for your hardware, documents, queries, and shard size / count. Also you're saving the the coordination overhead (to combine the subresults of the individual shards).

And if you're CPU-bound, more CPU definitely sounds like a good approach. If you're not actively writing to the index a forcemerge might also be helpful. After that it might be possible to find a more performant way to write your query, but that will totally depend on your current queries.

Load testing is the best way to know for sure.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.