Hi folks, we are looking to deploy our stack to 2 different AWS regions for high availability in case one region goes dark for any period of time. I have gotten a cluster up and replicating across the 2 regions over a site-to-site VPN tunnel, with a master in one region communicating with data nodes residing in each region.
I had thought initially that we could have the stack in each region point to the ES nodes in the same region, but now I am discovering that ES does some internal load balancing which would mean that a stack in one region could potentially communicate with the ES nodes in the other region which could cause unwanted latency.
I am curious if this is a concern? I am not exactly sure how the internal load balancing works (latency based? least conn?). If this is a concern, what options do I have in regards to replicating between 2 separate clusters? And yes, I have read this is not advised. We are not really worried about replication lag as the updates to our indices will be minimal.
Currently Elasticsearch does not support cross-region/datacenter replication because the replication that is done at shard level was not designed for high latency networks. Which means that a cluster that spans multiple AWS regions will suffer from availability issues (a cluster with a master node in one region and data nodes in another one won't last much long)
The recommendation is that you should maintain separate clusters per AWS regions and replicate data before indexing.
Clustering across regions over a site-to-site VPN has been working quite well for us, actually. My question was regarding how the internal load balancing works, and whether or not traffic from the stack in one region will attempt to access the ES nodes across the VPN in the other region.
And if your recommendation is to maintain separate clusters and replicate data, then how would you suggest we accomplish this?
Load balancing works in a round-robin way. Elasticsearch always assume that network and hosts are homogeneous so it will always try to distribute the load evenly.
Regarding nodes access the cluster is fully connected. So all the nodes is connected to all the nodes all the time. There are many types of communication between them, for instance, trying to ping the master.
Your cluster is working for now until it doesn't . I've seen it many times with our customers. I understand that cross-dc replication is not yet supported and that it's frustrating. Meanwhile you can try exploring a few options: https://www.elastic.co/blog/clustering_across_multiple_data_centers
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.