One. If you lose more than one node then it is possible you would lose both a primary and replica.
You need each index to have 4 replicas, so that the cluster holds five copies of every shard. You also need at least 9 master-eligible nodes so that the 4 nodes you lost is fewer than half of the total.
It might be cheaper to invest in more reliable infrastructure. It's normally sufficient to plan to deal with a single node going down, and if you're paranoid then you might try and deal with two. With reasonable infrastructure the probability of losing four nodes all at once should be infinitesimal compared with, say, a bug in your client software that accidentally deletes everything, against which no amount of redundancy can protect.
Elasticsearch, like many other distributed systems, elects a master node using a majority-based voting system and so can only operate while a majority of the master-eligible nodes are available. If you want to tolerate the loss of 4 nodes then your majority must be at least 5 nodes, giving 9 nodes in total.
No, that's not right. minimum_master_nodes is the minimum number of master-eligible nodes that are needed for the cluster to operate. If you set it to 5 then you need 5 master-eligible nodes to be available at all times. If you only have 4 surviving nodes then that's not enough.
You didn't mention these "sites" earlier, so the answers I gave were about losing an arbitrary four nodes. But if your 8 nodes are split into two "sites" of four nodes then I suspect that really you are concerned with losing one or other site, not any random set of four nodes. This is much easier to deal with. Forced shard allocation awareness will split the shard copies evenly across the sites, so you can get away with a single replica of each shard and still be safe if one or other site is unavailable.
However, if your cluster is split across just two sites then there is no way to make it truly resilient to the loss of either site. I mean that is theoretically impossible, not that this is a limitation in Elasticsearch. The way people normally do this is to install a single master-eligible node in each site, and add a third "tiebreaker" site that just contains a single master-eligible node, giving three master-eligible nodes in total.
You will lose your intersite link at some point. Networks are not reliable. If you have minimum_master_nodes set to 4 then both sides will be able to elect masters and form independent clusters. If you manage to get them to join back up again then you will see data loss and maybe find some indices to be unrecoverably corrupt. I do not see how this is a risk you want to accept.
As I said, this is not a limitation within Elasticsearch, it's a theoretical impossibility. Fault tolerance requires at least three independent failure domains.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.