Trying to optimize Elasticsearch cluster


(John T) #1

As our data has increased in size, we are having performance issues. I have been trying to determine the correct shard/node size, and I think we are way under powered. I have found some information on determining the correct allocation of resources, but they are so much bigger than what we are using that I wanted to get some verification.

We currently have a 7 node cluster, all machines running 8 cores and 56 g ram, with 28 allocated to the heap (verified using oob pointers). Our main index is 425g (161,133,495 docs), and we are running with only 5 shards. This is our first problem as each shard is around 90g (suggested max is 30g). We currently also have only one replica.

I am looking at this article: Optimizing Elasticsearch: How Many Shards per Index?

Based on the "Large and Growing Dataset", I came up with this:

Index size GB 500.00
Max size GB 30.00
Shard count (Index / Max) 16.67
Primary Node Count (= shard count) 16.67
Replica Count 2
Replica Node Count (Primary * Replica) 33.33
Total Nodes 50.00

This is telling me that even without capacity planning, I should be using 50 nodes. Am I reading that correctly?

Thanks,
~john


(Mark Walkom) #2

We recommend no more than 50 gig per shard.
There's no need to have one shard per node either, ES is not an in-memory system.


(Jörg Prante) #3

If you have 7 nodes, why 5 and not 7 shards?

You should distribute your data uniformly across the data nodes.

50 nodes seem a bit high to start with.

Here is my estimation:

Index size 500 GB
Shard target size 50 GB
Shards per node: 1 + replica shards, replica level 2 -> 3
Estimated node count: (500 GB / 50 GB) * 1 = 10

With shard target size of 70 GB, 7 data nodes will fit into the estimation.

Now, you could check on a test system whether there is a performance and resource consumption difference between 7 nodes each holding 3 shards of 50 GB or 7 nodes each holding 3 shards of 70 GB. This will mostly depend on queries (aggregations, filters).

If there is a difference, and 7 nodes can't handle the workload, go slightly for more nodes (9, 11, 13, ...) and do not forget to align the shard count with the number of nodes.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.