Why elect a leader?

(Oli Lalonde) #1

Hi everyone,

I've been reading ES's code and was wondering about an architectural decision. If I understand correctly, an ES cluster first elects a leader which is then in charge of managing the cluster state. I am guessing new nodes notify the master when they join the cluster and the master checks on their health/state periodically. Nodes probably update their own internal cluster state by querying the master.

Anyways, my main questions is: why elect a leader at all? Why not go with a symmetric design where each node is responsible for keeping track of the cluster state?


(Mark Harwood) #2

Thought this was a despairing political statement at first :slight_smile:

When you create an index, delete an index, change the fields defined on an index, blacklist an unresponsive node etc these are all examples of decisions that have to be universally accepted in the cluster and having autonomous decision making over concerns like this can lead to "split brain" scenarios where one half of the cluster believes one thing and another determines something different. A single nominated master is required to dictate the status of the cluster (but with provisions for their replacement to avoid SPOF).

(Jörg Prante) #3

Maybe you have found this blog post?

Note, as a special design in Elasticsearch, nodes that are configured as leader-eligible carry an in-memory copy of the cluster state. So, the nodes do not query the leader all the time, but the leader broadcasts deltas when cluster state changes. Each leader-eligible node can be seen as an independent node, independently from other nodes regarding cluster state knowledge, and can quickly take the leader role in case of failure.

In the beginning, Elasticsearch was not designed for partial leader responsibility (e.g. a group of leader nodes that can subdivide indices management) but this limitation seems to be subject to change, motivated by cross data center usage scenarios. Regarding search only, the challenges are a lot easier. There was once a "tribe node" approach to be able to use more than one cluster by a special client with limited API methods, which is now deprecated, and being replaced by cross-cluster search, and probably cross-cluster indexing in the future.

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.