I have the following dilemma. I've been provided with 3 hosts on which to run a virtualized Elasticsearch cluster. The problems is that if I want to run 3 masters and 6 datanodes each of the elasticsearch nodes ends up with just 2 vcpus and 16GB of RAM. Alternatively, I can run 3 masters and 3 datanodes thus ending up with more vcpus and more RAM per node, but less nodes in total.
Which is better in this case? More but less powerful nodes, or less but more powerful nodes?
Is 2 vcpus enough for elasticsearch node?
Will 8GB RAM for masters and 16GB for datanodes be enough?
I know I have to test the setup in order to know, but still any rough estimates will be of help.
I will ingest between 300-500GB a day.
The question is whether to put 1 master and 2 datanodes on each host. Or 1 master and 1 datanode. As said 300-500GB a day with no or little ingest transformation. I plan for the datanodes to do the ingesting.
Okay, then I will consider running with 3 masters (2 cores 16 GB) and 3 datanodes (4 cores, 32 GB). Does it makes sense then to change the defaults 5 shards/1 replica? How many shard will be best then? 3 shards and 1 replica?
Also I assume it is alright for the masters and data to have different heap sizes.
One last question, should I make master and data nodes equal in terms of RAM and CPU if I can or should I give the datanodes more resources?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.