How does master election process takes place when multiple master eligible nodes are present?


(Nikhil Kapoor ) #1

Hi Everyone,

Elasticsearch Version-5.4.3

I want to know how master election process takes place when multiple nodes are present?

Suppose, there are 5 elasticsearch nodes (all are master eligible nodes) present in a cluster

  1. What is algorithm that which node will be elected as active master among all nodes.
  2. If active master nodes goes down, which node will be elected as the master node among the remaining four nodes.

Is master election process done randomly, which means any node can act as a master node among the remaining 4 nodes when active master node goes down?

  1. If master election not random than what is the algorithm for master election?.

  2. Also, in a cluster of 5 nodes, is there any setting that can prioritize node (for e.g. node2) from being elected as an master node among the remaining nodes, if the active master node goes down.

I am following the given below link:-
[https://www.elastic.co/guide/en/elasticsearch/reference/5.4/modules-discovery-zen.html]


(David Turner) #2

For all practical purposes the way the winning master node is chosen in an election is not defined, and any node can win an election.

No. Can you explain why you would like to do this?


(Nikhil Kapoor ) #3

Thanks @DavidTurner for the response.

No. Can you explain why you would like to do this?

As, I thought it will prevent from finding the master node in cluster of multiple nodes, if active master node goes down. The cluster would already be knowing if active master node goes down then the specific node (for e.g. node2 would be elected).

Also wanted to know few more things
In a cluster of two nodes:
Node1 Roles: Master & Data (Active master node)
Node2 Roles: Master & Data

When node1 (active master node is made down), node2 is elected as master node. I have few questions related to logs:

Below are the logs of node2:
[DEBUG][o.e.c.s.ClusterService ] [node2] processing [zen-disco-node-failed({node1}{TRq-P0CKS5K0oZWFLIaBIA}{58Q6yXhDTa-Np5uiUa7t8A}{10.0.4.67}{10.0.4.67:9300}), reason(transport disconnected)[{node1}{TRq-P0CKS5K0oZWFLIaBIA}{58Q6yXhDTa-Np5uiUa7t8A}{10.0.4.67}{10.0.4.67:9300} transport disconnected]]: execute

[DEBUG][o.e.c.s.ClusterService ] [node2] cluster state updated, version [45], source [zen-disco-node-failed({node1}{TRq-P0CKS5K0oZWFLIaBIA}{58Q6yXhDTa-Np5uiUa7t8A}{10.0.4.67}{10.0.4.67:9300}), reason(transport disconnected)[{node1}{TRq-P0CKS5K0oZWFLIaBIA}{58Q6yXhDTa-Np5uiUa7t8A}{10.0.4.67}{10.0.4.67:9300} transport disconnected]]

[INFO ][o.e.c.s.ClusterService ] [node2] removed {{node1}{TRq-P0CKS5K0oZWFLIaBIA}{58Q6yXhDTa-Np5uiUa7t8A}{10.0.4.67}{10.0.4.67:9300},}, reason: zen-disco-node-failed({node1}{TRq-P0CKS5K0oZWFLIaBIA}{58Q6yXhDTa-Np5uiUa7t8A}{10.0.4.67}{10.0.4.67:9300}), reason(transport disconnected)[{node1}{TRq-P0CKS5K0oZWFLIaBIA}{58Q6yXhDTa-Np5uiUa7t8A}{10.0.4.67}{10.0.4.67:9300} transport disconnected]

[DEBUG][o.e.c.s.ClusterService ] [node2] publishing cluster state version [45]

[DEBUG][o.e.c.s.ClusterService ] [node2] applying cluster state version 45

[DEBUG][o.e.c.s.ClusterService ] [node2] set local cluster state to version 45

[DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [node2] failed to execute on node [TRq-P0CKS5K0oZWFLIaBIA]
org.elasticsearch.transport.NodeNotConnectedException: [node1][10.0.4.67:9300] Node not connected
at org.elasticsearch.transport.TcpTransport.getConnection(TcpTransport.java:655) ~[elasticsearch-5.4.3.jar:5.4.3]

[INFO ][o.e.c.s.ClusterService ] [node2] removed {{node1}

From where does node2 removes node1? Is it removing node1 from zen discovery module of node2?

[DEBUG][o.e.a.a.c.n.s.TransportNodesStatsAction] [node2] failed to execute on node

Does above log means that node2 fails to update cluster state on node1?


(Christian Dahlqvist) #4

If Node2 is successfully elected master when Node1 fails, you have unfortunately incorrectly configured your cluster. The minimum_master_nodes setting should be set according to these guidelines, which means both nodes are required to be present in order to elect a master if you only have two master-eligible nodes in your cluster. If you want to be able to handle one node going down without affecting the ability to perform writes, you need to have a minimum of 3 master-eligible nodes.


(David Turner) #5

I'm not sure I understand this, but I think your concern is that master-election is simpler if all the nodes know which node they should elect if the current master fails? Unfortunately this doesn't really help because the node they all want to elect might also have failed.

As @Christian_Dahlqvist says, this doesn't happen in a correctly-configured two-node cluster. You need at least three master-eligible nodes, with minimum_master_nodes set correctly, to support this kind of failover.

The "cluster state" is the name of the data structure that tracks things like the membership of the cluster, and is managed by the elected master: the elected master calculates updates and broadcasts them to all the other nodes. So this log line means that node2, as master, is removing node1 from the cluster state and will then go on to broadcast this update to all the other nodes in the cluster.

No, the reporting component for that log line is o.e.a.a.c.n.s.TransportNodesStatsAction so this means that node2 asked node1 for its node statistics and this request fails because node1 is disconnected.


(Nikhil Kapoor ) #6

Hi @DavidTurner @Christian_Dahlqvist,

I really appreciate the help from both of you. Please provide me some more clarification on this topic.

I understood your point on split-brain.

I tried to to reproduce the scenario:-
Node1 Roles: Master & Data (Active master node)
Node2 Roles: Master & Data

I disconnected the node1 from the network and observed that in this case of network failure node2 elects itself as a new master node, which result in two clusters (split-brain).

However, when node1 is re-connected to network, it is observed that node2 rejoins the cluster and I executed the below command to check the current master node:
[root@node1 ~]# curl -X GET "node1:9200/_cat/master?v"
id host ip node
TRq-P0CKS5K0oZWFLIaBIA 10.0.4.67 10.0.4.67 node1

It is observed that the node1 is now the active master node.

What does this means now?

Also, @Christian_Dahlqvist

If you want to be able to handle one node going down without affecting the ability to perform writes, you need to have a minimum of 3 master-eligible nodes.

What type of write operations you are talking about in case of two master eligible nodes? Can you give an example for this?

Regards
Nikhil Kapoor


(Christian Dahlqvist) #7

If you end up with a split-brain scenario and the two nodes can not communicate, replica shards on both sides will be promoted to primary. Any inserts or updates you perform while the cluster is partitioned may be lost when it comes back together, as one of the primary shards will replace the other one. You can test this by creating a split-brain scenario by disconnecting the network and then index a number of documents into one of the indices on each side of the partition and then see how many you still have once the partition has been healed.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.