We have a requirement from our operations team that all solutions have to geo-redundant with automatic failover. We have been forbidden from using ElasticSearch because of this (even though it is the best technical solution). What do other people do for geo-redundancy? Because of the automatic failover requirement what we really need is a cluster that does two-way replication over a WAN.
Let's say if you have one cluster, two zones and you can afford to have 1 master, and 2 data nodes per zone. If one master goes down, you have the other one holding the cluster (actually, it is recommended to use 3 masters) With proper sharding and replica settings, shards and replicas can be distributed into two zones so if one zone goes down, the cluster is still functioning properly without losing any down time or data.
Check out this document for proper sharding into different zone
That still sounds like it is in the same data center. What about nodes in different data centers geographically far apart?
When you put two zones into two countries, what's holding you back is the communication link. That's not ES issue, it's the network issue. ES will replicate shards properly with data nodes in the cluster at least from what I've seen. The cluster can have data nodes in one or more locations assuming they can talk to each other.
Check out a few options here - https://www.elastic.co/blog/scaling_elasticsearch_across_data_centers_with_kafka
Another solution is to use a queue, such as Kafka and Mirror Maker to replicate your data between DCs. Then ES cluster in each DC can consumed the same data from Kafka and give you redundancy.
This is what we do as we have many DCs around the world. Each ES cluster is self contained.
For now there isn't anything built in to handle replication across data centers. Some folks form a cluster across data centers but that is fraught. The best solution out there now is to build a thing that replicates all of your changes to both data centers using your favorite tools.