As our data has increased in size, we are having performance issues. I have been trying to determine the correct shard/node size, and I think we are way under powered. I have found some information on determining the correct allocation of resources, but they are so much bigger than what we are using that I wanted to get some verification.
We currently have a 7 node cluster, all machines running 8 cores and 56 g ram, with 28 allocated to the heap (verified using oob pointers). Our main index is 425g (161,133,495 docs), and we are running with only 5 shards. This is our first problem as each shard is around 90g (suggested max is 30g). We currently also have only one replica.
I am looking at this article: Optimizing Elasticsearch: How Many Shards per Index?
Based on the "Large and Growing Dataset", I came up with this:
Index size GB 500.00
Max size GB 30.00
Shard count (Index / Max) 16.67
Primary Node Count (= shard count) 16.67
Replica Count 2
Replica Node Count (Primary * Replica) 33.33
Total Nodes 50.00
This is telling me that even without capacity planning, I should be using 50 nodes. Am I reading that correctly?
Thanks,
~john