Basic HA/Failover Setup

Hi All, I am looking for the basic configuration for HA/Failover. I have 2 clusters.
The first cluster consists of separate coordinating and master/data nodes running in 2 processes in one system while in the 2nd cluster the Master/Data nodes run as one unit. The configurations are below

Master Node on Cluster 1 on Machine 1

cluster.name: remote
node.name: Remote-Master
node.master: false
node.data: false
node.ingest: false
path.data: D:\ELK-Runnable\Elasticsearch\data
path.logs: D:\ELK-Runnable\Elasticsearch\logs
network.host:   0.0.0.0
http.port: 9200
transport.port: 9300
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.unicast.hosts: ["0.0.0.0:9300", "0.0.0.0:9301", "192.168.116.138:9300"]

Data Node on Cluster 1 on Machine 1

cluster.name: remote
node.name: Remote-Data
node.master: true
node.data: true
path.data: D:\ELK-Runnable\Elasticsearch_Data\data
path.logs: D:\ELK-Runnable\Elasticsearch_Data\logs
network.host:   0.0.0.0
http.port: 9201
transport.port: 9301
discovery.zen.ping.unicast.hosts: ["0.0.0.0:9300", "0.0.0.0:9301","192.168.116.138:9300"]

Master and Data on Cluster 2 on Machine 2

cluster.name: backup
path.data: C:\Program Files\Elasticsearch\data
path.logs: C:\Program Files\Elasticsearch\logs
network.host: 0.0.0.0
http.port: 9200
transport.port: 9300
discovery.zen.ping.unicast.hosts: ["192.168.119.1:9300", "192.168.119.1:9301", "0.0.0.0:9300"]

I uploaded 1 index and it was done successfully. I have enabled data replication and data is successfully copied to the 2nd cluster. Now, I shutdown the Master/Data node in cluster 1 and was hoping that the 2nd cluster would support the failover. I have added the 2nd cluster node in the zen discovery section of the .yml file. However, it does not work. Any suggestions would be helpful. Thanks

A cluster cannot tolerate the loss of half or more of its master-eligible nodes. Therefore you need at least three master-eligible nodes in order to have a fault-tolerant cluster.

Thanks David. I am still a newbie to this. So pl correct my understanding here
I have the following

1 (call it A1) - Coordinating & Master Nodes in 1 process on Machine A in Cluster 1
1 (call it A2)- Data & Master Nodes in 1 process on Machine A in Cluster 1

1 (call it B1) Master and Data in 1 process on Machine B in Cluster 2 This has been configured with CCR.

Now, I stop A2 hence A1 should link to B1. However, this does not happen.
So if we need to have 3 master eligible nodes, this would mean the following

  • Potentially having 2 Master nodes and a Data Node in Cluster 1. Total of 3 Nodes
  • Next, stop the Data Node to mimic a failover scenario
  • Cluster 1 will automatically fall back on the Node in Cluster 2

Is this assertion correct?

No, that's not what should happen. There's no safe way to move a node to a different cluster without risking data loss, so Elasticsearch won't do that. If you stop A2 then you have stopped half of the master-eligible nodes in cluster A, and the only safe way to proceed is to start A2 again.

Cool. Thanks David. I will figure something out...

Why do you not just set up a single cluster with 3 nodes that hold data and are master eligible? This cluster can handle one node failing and is highly available.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.