I am more of a developer than ops, so I am wondering how to deploy a HA cluster on AWS' EC2 instances. The thing is, I want this cluster to survive a zone failure...

Here's what I have planned so far:

  • 7.4.0
  • Across 2 availability zones (say, A and B)
  • Each node will be properly configured with an attribute that indicates to which zone it belongs
  • Cluster and indices will be aware of that, so it can allocate replicas to the other zone
  • I was thinking of having 3-node in each zone, so if one zone completly fails, the election between the three remaining node is successeful
  • Each node will act as master eligible and data

The questions I have now:

  • Do we still have to have at least 3 node to avoid split brain? (I didn't kept up to date with some change in > 7 that I think solved that)
  • In this scenario the total 6 node must be working all the time. Is there a more cost effective way to assure the HA cross availability zones?
  • How should I set the network configuration for the discoery phase? Install and use the EC2 plugin?
  • And what about the clients? How should they be configured to access this cluster in a HA fashion? Should I list all the nodes IPs in the "hosts" attribute, so in case of a failure it trys the next one?
  • Or should I have some balancer in front of the cluster?

I'll appreciate any info you can give me. If there's already a post, an Elastic's blog or an article giving guidelines around this scenario, please send them my way.

As I said, I more used to writinf queries than to deploying clusters...

Thank you!

Try HA Proxy with load balancing in Layer 7, you can check here..

In order to make your cluster highly available I case of AZ failure you need to deploy the cluster across 3AZ so that a strict majority of master eligible nodes will survive the crash.

Hello @Christian_Dahlqvist

I am curious about the need for 3 AZs. When it comes to be protected from the failure of one AZ my reasoning is:

  • A minimal production cluster formation needs 3 master eligible nodes to have a successeful voting process
  • In the event of a AZ failure, I'd need other 3 master eligible nodes in the remaining zone in order to a new master to assume
  • Other than that I would need all these 3 nodes (which will also be data nodes) to have the necessary primary and replica shards to assure green state

So the way I see, I would need 6 nodes, 3 in each zone, so that if I completly lose a zone, the other one has everything it needs to assume. In a health zone state I would have a 6 node cluster, which would probably help to improve performance, but seens totaly unecessary and expensive to my use case...

Am I missing something?
Do you guys know some article that gives guidelines for deploying a cluster in a multi-zone architecture?

By the way, thanks for having interest and taking the time to answer my question.

All nodes across both AZ form a single cluster. In any failure scenario you need a strict majority of master eligible nodes to still be available in order to have a healthy and functioning cluster. If you have 6 nodes a minimum of 4 (majority of 6) nodes which does not allow a full AZ to go down. If you however have at least one master eligible node in a third AZ you get around this as the majority of 7 is still 4.

