Elasticsearch high availability across availability zones

Hi Team,
I am planning to test implement my ES cluster in cloud so thought to get some best practices from you. My most requirement is to have High Availability of ELK all 3 services as well as index data.
I have 10 nodes to distribute across 2 data centers in same region, with replication factor 3

3 - master
2 - logstash
2 - kibana
3 - data nodes

But my worry is I have to keep 2 master in one DC-1 (AZ-1) and 3rd master in another DC-2 (AZ-2). If the AZ-1 goes down then my cluster will be down and the 3rd master will not be able to claim itself a master.

So how to deal with this situation? I am restricted with only 2 DC's and don't have 3rd data center or AZ.

Your advise will help me

You can't, sorry, you need at least three availability zones. This isn't a restriction that Elasticsearch imposes: it's not even theoretically possible to build a fully resilient system across two zones.

The best you can do is to split almost all of your cluster across two "main" zones and add another dedicated master-eligible node in a small independent third zone. This extra node may be a voting-only node.

Thank You @DavidTurner for your response and yes that is a good option at hand, if we configure 3rd master as voting only master eligible node then does such node take all responsibilities of a regular master node except of becoming itself as master node like creating deleting index, allocating shards, internal node communication etc...

thought to clarify few questions - when every data is written into the cluster. does each data goes through master? does the master node keeps track of the metadata of the data and keeps others in sync with write acknowledgement? In such case does voting only master node will also carry such responsibilities?

No.

Yes

No.

Any sense of the latency requirements for the 3rd zone voting-only master, i.e. can we place it on another cloud provider or region? If so, it'll be way, way slower than local masters, like 100ms vs. 1ms but will that matter in normal operation? Then when a zone dies, of course votes have to happen, etc. but presumably that will work, just a little slower - does that make sense as a solution for us two-zone only folks (we have this in Beijing, or example)?

From the docs:

However all master-eligible nodes, including voting-only nodes, require reasonably fast persistent storage and a reliable and low-latency network connection to the rest of the cluster, since they are on the critical path for publishing cluster state updates.

We can't really give a concrete figure for how much latency is acceptable since it only really has a performance impact. IMO 100ms is a lot, much more than we had in mind when working on this. Reliability is also very important for the connections between nodes in a cluster, and this is the usual reason for problems in a cluster split across sites.

So the challenge remains for folks with only two ’near’ zones of what to do. Maybe that means finding a site 25ms away, but of course that’s highly variable (not sure of AWS’ published/test inter-region links; might be not so bad).

Or a process to manage split brain so with an hour down time, failover can be managed, presumably with secondary cluster restart with fewer master nodes required, which would seem okay - manually confirm the primary AZ down, restart secondary cluster with master=1, ideally it promotes all its replicas to primaries and you’re good, though recovery to both zones is both full restart and likely challenges to resync original zone primaries which are now way behind and maybe state lost; not clear how that all works.

Is there any documented process? I know I’ve asked before.

Or is this back to CCR as the solution, but seems messier to setup, search, etc.

Steve

Note that this has changed for Elasticsearch 7.x, so things that worked before may no longer be suitable. If you lose quorum you can not necessarily easily reconfigure the cluster and spin up and add new master nodes.

This didn't work in earlier versions too, although it sometimes looked like it did work having silently lost data. The major change in 7.x in this area is that we now avoid cases that might cause this silent data loss.

No, not really, because there really isn't any way to achieve this without occasionally silently losing an immeasurable amount of data. As in, not even theoretically, this isn't a constraint that Elasticsearch imposes.

Given the inevitable choice between safety and availability each individual cluster chooses safety, with the intention that you can combine clusters together to improve availability at the expense of safety. It's much easier to work that way round, although it's certainly not easy either way.

1 Like

Thanks, though my challenge is this is the reality of many folks - we have an AWS deployment with two zones on AWS; there simply are no more zones to use, none within 50-100ms in other regions. Same for folks doing this in data centers, private OpenStack, and so on. So I understand it’s not good, and there can be data loss, and it’s a tradeoff, but I’m am troubled that cases like this are just told, don’t do it at all, run 1 zone, I guess.

The world is full of such challenges and tradeoffs like this. I guess it means I have to find time to test it up and doc some solutions or failures, if only for my own interest and learning; even if it all fails, I can at least then explain what I did, why, and results. And be smarter, maybe :wink:

Seems CCR (platinum level!) can help, but $$ and even then not clear full-bidirectional works seamlessly for the app as indexes have to change modes and switch leaders, etc.

Steve

What if I have a 3rd AZ which gives any where between 40ms to 75ms latency for the 3rd voting-only master? will that suffice?

It's impossible to say. Quoting my earlier comment:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.