I am trying to allocate shard replicas to nodes in a different region than where my primaries live. I'm looking for a way to specify in the index mapping template that replicas must live in a different region. I'm able to specify that primaries live in the same region as they were produced (using Shard Allocation Awareness), but haven't found how to use Allocation Awareness to specify that replicas must live in the other region.
I have the following hypothetical cluster setup on ES 7.2:
10 Data Nodes
5 data nodes in the West Coast (DC1)
tagged: node.attr.region: DC1
5 data nodes on the East Coast (DC2)
tagged: node.attr.region: DC2
4 Logstash instances
2 instances in the West Coast (DC1), 2 instances on the East Coast (DC2)
DC1 instances write data under the index pattern: data-DC1-%{YYYY-MM-DD}
DC2 instances write data under the index pattern: data-DC2-%{YYYY-MM-DD}
3 Master Nodes
tagged: cluster.routing.allocation.awareness.attributes: region
I then have an index template for each index pattern:
But how can I specify allocation of the replicas to be in the opposite datacenter/region? The reason for this necessity is that if one datacenter goes offline, I need a full copy of the data.
I've attached a graphical representation of what I'm trying to achieve:
Primaries and replicas are pretty much interchangeable. A replica can become a primary at any point, so the allocator treats them the same. Shard allocation awareness will aim to spread the shards across the two regions so that there's one copy in each, but there is no guarantee that this will always be true.
Also, with only 2 regions you cannot expect to be resilient to either region going offline because for resilience you need 2 of the 3 master nodes to remain available. You'll need to put one of the master nodes in an independent third location (maybe as a voting-only node).
However, cross-region deployments are very much not recommended. Elasticsearch expects all the node-to-node connections within a cluster to be equivalent and does not optimise for some of those links being higher latency or less reliable. You should use cross-cluster replication for your setup.
Thanks for your response. Unfortunately, a Platinum-level service such as CCR is not an option at this time. Additionally, it's my understanding that I'd need to effectively double the amount of data nodes necessary to contain the same amount of data with CCR. I'm likely already going to need to scale out past 10 data nodes in the near-future, so doubling whatever capacity I find necessary is also not ideal.
With regards to resilience, my current solution regarding the three (or five) master nodes is likely to have either all, or most-likely just the tie-breaker node deployed as a cloud instance. I'll look into a voting-only node, and likely utilize that functionality to keep it lightweight.
Can you please elaborate on the the downsides to cross-region deployment? I'd like to explore mitigations to these downsides as much as possible.
A bit more about my infrastructure:
Redundant 100GB fiber circuits between DCs - 99.999% uptime
12 TB SSD per node, 256GB RAM, 32 CPU
If it's determined that the best strategy for my use case is the one outlined prior, it appears I could write a script to utilize the Cluster Reroute functionality for my replica nodes. I imagine the data flow would go something like:
Check to see if the replica is being created in the region opposing the primary (56% chance it is)
If replica is not being created in the opposing region, cancel the allocation
Force the replica to be created on a specific node in the opposing region
I ended up not being able to allocate replicas to the opposing region, as you'd hinted at. Ended up doing Shard Allocation Awareness - I'll update this thread if that doesn't work for my needs as you indicated it may not. Thanks for your recommendation.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.