Internal load balancing and cross-region replication

Steve_Best · August 11, 2017, 4:04pm

Hi folks, we are looking to deploy our stack to 2 different AWS regions for high availability in case one region goes dark for any period of time. I have gotten a cluster up and replicating across the 2 regions over a site-to-site VPN tunnel, with a master in one region communicating with data nodes residing in each region.

I had thought initially that we could have the stack in each region point to the ES nodes in the same region, but now I am discovering that ES does some internal load balancing which would mean that a stack in one region could potentially communicate with the ES nodes in the other region which could cause unwanted latency.

I am curious if this is a concern? I am not exactly sure how the internal load balancing works (latency based? least conn?). If this is a concern, what options do I have in regards to replicating between 2 separate clusters? And yes, I have read this is not advised. We are not really worried about replication lag as the updates to our indices will be minimal.

Thanks!

thiago · August 12, 2017, 1:21am

Currently Elasticsearch does not support cross-region/datacenter replication because the replication that is done at shard level was not designed for high latency networks. Which means that a cluster that spans multiple AWS regions will suffer from availability issues (a cluster with a master node in one region and data nodes in another one won't last much long)

The recommendation is that you should maintain separate clusters per AWS regions and replicate data before indexing.

Native cross datacenter replication is on the roadmap, though. See https://www.elastic.co/blog/elasticsearch-sequence-ids-6-0

Steve_Best · August 14, 2017, 1:52pm

Clustering across regions over a site-to-site VPN has been working quite well for us, actually. My question was regarding how the internal load balancing works, and whether or not traffic from the stack in one region will attempt to access the ES nodes across the VPN in the other region.

And if your recommendation is to maintain separate clusters and replicate data, then how would you suggest we accomplish this?

Thanks!

thiago · August 15, 2017, 11:41pm

Load balancing works in a round-robin way. Elasticsearch always assume that network and hosts are homogeneous so it will always try to distribute the load evenly.

Regarding nodes access the cluster is fully connected. So all the nodes is connected to all the nodes all the time. There are many types of communication between them, for instance, trying to ping the master.

Your cluster is working for now until it doesn't . I've seen it many times with our customers. I understand that cross-dc replication is not yet supported and that it's frustrating. Meanwhile you can try exploring a few options: https://www.elastic.co/blog/clustering_across_multiple_data_centers

system · September 12, 2017, 11:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multi-region Elastic Cloud cluster? Elasticsearch	2	1667	February 18, 2019
Cluster Elasticsearch Elasticsearch	2	477	June 9, 2017
Unequal load on nodes within single cluster on EC2 Elasticsearch	2	387	July 6, 2017
Elasticsearch, local load balancing VIPs and GSLB Elasticsearch	1	434	July 6, 2017
How to replicate ES between two data center? Elasticsearch	1	292	July 6, 2017

Internal load balancing and cross-region replication

Related topics