I have a question about whether a two nodes cluster can be a sufficient setup in avoiding data loss in the event of the complete failure of one node. In other words, for a two nodes cluster, can it be set up so that each node contains the complete data of the cluster?
The documentation I found is not very clear on this point. I'm not talking about high availability, just the prevention of data loss.
To make this easier to answer, is it correct that if I set the "number_of_replicas" setting to 1 for each index, (which is the default), then the data will be mirrored in each node?
I'm asking because I have a requirement to add a second node to our existing one node cluster for better performance as well as a backup, using only two servers.
Yes, the replicas will be allocated so you have that level of redundancy.
No as you have no majority if you lose a node or there's a network partition or something else that might cause a split brain. Then you risk data loss because if a client is talking to both nodes and sends documentA to node0 and then tries to do an update to documentA via node1 you are immediately out of alignment and you may lose that update.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.