I want to set up a new Elasticsearch cluster on-premises of minimum 3 master nodes and more than 6 data nodes . In our cluster we are ingesting around 100 GB of data per daily basis. And We want to keep 30 days data in hot phase,above 30 days data in warm and above 180 days data in cold phase.Can you please help me to answer the below configuration.
What should total size of cluster?
What should be total number of master nodes?
What should be RAM , Disk and CPU core for each MasterNode?
What should be count of total number of hot nodes?
What should be RAM , Disk and CPU core for each hot node?
What should be count of total number of warm nodes?
What should be RAM , Disk and CPU core for each warm node?
What should be count of total number of cold nodes?
What should be RAM , Disk and CPU core for each cold node?
What should be size of JVM Heap?
What is count of primary shared and replica for each indices?
If there is any formula to calculate the above please help me
Yangesh, there is no golden bullet formula for it.
you need 100gig x 31 x replica = about 6.2 terra byte space + some I will say no less then 10TB
3 master node is minimum for avoiding split-brain situation in cluster.
6 data node seems good count. all the elastic daemon in data node use 30gig java if you have memory (which is cheap now a days).
I will keep six or less shard for each index and each shard will be no less then 20gig.
(what this means use ILM policy to rotate index when each shard reaches 20gig, not index)
cold node you can calculate from this. but remember more then 1000 shard in a node is limit by Elasticsearch. if it goes above that you will have a problem.
there is shrink option in elastic and when it moves to cold you can shrink the count of shard to number of shard/event number (i.e if your shard is 6 you can shrink to 4 or 2)
Can you please advise how do we calculate shared size per index?
And how much core cpu should we go for data nodes?
And also if my cluster ingest daily 100gb of data monthly it generates 3TB of data and 6 months around 18TB of data...so just wondering how you calculated ->about 6.2 terra byte space + some I will say no less then 10TB).. And how many hot nodes and warm nodes we should go for?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.