Hello,
how setup HA Elasticsearch cluster in geocluster environment? We have 2 datacenter location and for each of datacenter we have 1 hardware virtualization server.
I know that best setup is use odd number of Elasticsearch nodes, so should we use two ES nodes for DC 1 a one for DC 2 or is there some better approach?
Not really, at least not within a single cluster. Each cluster relies on the node-to-node network connections being (a) reliable, (b) low-latency and (c) high-bandwidth, and none of these are really true of connections between different regions. Multi-region deployments are supported out-of-the-box using federation: cross-cluster search and/or cross-cluster replication.
I mean technically you can split a cluster across geographically separated regions, and it'll do the right thing even if the connection is unreliable and/or slow, but it won't necessarily perform very well.
If you only have two failure domains (e.g. regions) then it is not possible to achieve high availability. At least, you cannot build a system that can tolerate the loss of either domain. This isn't a limitation of Elasticsearch, it's a fundamental property of distributed systems: with two domains one or other of them will always be critical to the health of your cluster. The recommended approach is to have at least three failure domains. Fortunately with Elasticsearch you only need a single, small, dedicated master node in the third domain and you can leave all the heavy machinery in the other two.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.