Disaster Planning

Hi everyone!

First off let me in advance apologize for the probably dumb question but I am fairly new to Elasticsearch.

This all assumes that the master nodes are dedicated and running.

I am trying to do disaster recovery planning and I am not seeing (or not understanding completely how Elasticsearch works) how to calculate for data node failures. An example of this would be if I had 4 data nodes and each index has 5 shards and 1 replica how many data nodes can I lose at once? Lets say that 2 of the 4 nodes are in a single rack and the rack goes down. What if a single node fails and allows time to replicate and bring up all the replica shards then another node fails?

I was trying to find a formula or something that would say if you have X nodes with X shards and X replicas this is what will happen? Or is it the data is still available as long as one node is up and happens to have a single shard of that index?

Any info or help or links would be greatly appreciated.

If you have 1 replica configured you can lose one node at once without losing access to the data. Once this happens a new replica will get allocated. Once all shards again are allocated you can again afford to lose a single node.

If you have multiple racks or availability zones, you can use shard allocation awareness to make sure that both primary and replica of the same shard are both places within the same rack or availability zone. This way you can lose a full rack without losing access to your data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.