About using a two node cluster for data loss prevention

Hello...

I have a question about whether a two nodes cluster can be a sufficient setup in avoiding data loss in the event of the complete failure of one node. In other words, for a two nodes cluster, can it be set up so that each node contains the complete data of the cluster?

The documentation I found is not very clear on this point. I'm not talking about high availability, just the prevention of data loss.

To make this easier to answer, is it correct that if I set the "number_of_replicas" setting to 1 for each index, (which is the default), then the data will be mirrored in each node?

I'm asking because I have a requirement to add a second node to our existing one node cluster for better performance as well as a backup, using only two servers.

Thanks,
Osama

No.

Yes, the replicas will be allocated so you have that level of redundancy.
No as you have no majority if you lose a node or there's a network partition or something else that might cause a split brain. Then you risk data loss because if a client is talking to both nodes and sends documentA to node0 and then tries to do an update to documentA via node1 you are immediately out of alignment and you may lose that update.

So I guess what this means is that using a two node cluster can be problematic with data loss possibility even without having a node failure.

In light of this, I'm thinking of recommending using either a three nodes cluster, or a single node cluster with snapshots for backup.

Does this sound reasonable?

Yep :slight_smile:

2 Likes

FWIW the reference manual has a section on exactly this topic:

Because it’s not resilient to failures, we do not recommend deploying a two-node cluster in production.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.