We are having 2 separate clusters in 2 different data centers and currently they are active-active.
After Elastic 6.0, synchronization will be in place.
So if data ingestion is enabled for one Elastic Cluster/Data Center with syncing to the other Data Center (from A to B) is enabled, Can I run from one Data Center and stop maintaining two distinct-parallel Clusters? At the time of the switch over, the sync process will be enabled from B to A. (making sure first A to B sync first is completed)
is this the right approach? Does anyone tried this and what were the caveats?
They are only 2 different buildings in the same city (on-premise Data Centers of an enterprise with good enough link/bandwidth between the two. I was thinking to use ES features to keep any 2 cluster in synch? !!
OK, so you are going to set up a single cluster that spans the 2 data centres and use shard allocation filtering to ensure shards are distributed correctly?
In this scenario all indexing requests will go to both data centres as they both should hold a copy of every shard, so it may have an impact on performance and lead to increased traffic between the data centres. Queries will also be executed across both data centres. Depending on your data and query volumes, I would recommend benchmarking this to make sure the connection between the data centres is fast enough and have sufficient bandwidth.
Given that you have 2 data centres, it is also very difficult, if not impossible, to make this highly available with respect to a full data centre failing. The reason for this is that it is impossible to evenly distribute and odd number of master eligible nodes across the 2 data centres.
This is therefore something we generally do not recommend nor support. If you are looking for a DR setup, the architecture you currently have in place is in my opinion better.
I am not sure I understand what you mean by switchover. Could you please explain?
Thanks Christian. My understanding is that even if ES 6.0 has Synch feature, it should not be used unless the other Data Center is DR. So we will need to continue maintaining 2 separate ES Clusters ingesting the same data separately. Thx...
I am not sure what sync feature you are referring to. Sequence IDs were introduced in Elasticsearch 6.0, and this is the foundation for building cross-datacenter replication, but such feature is not yet available.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.