How to determine number of master nodes

Is there a way to find out how many master nodes I need for a number of data nodes For example If I have 100 data nodes, how many masters nodes do I need? In the case of 12 data nodes Or 18 data nodes

The answer is usually 3, and does not scale with data node count. Only one of the master nodes is active at any point and the others are there for resilience. The only reason you would go over 3 is if you require to be able to withstand more than one master node failing at any time. If you e.g. deploy a cluster across 5 distinct availability zones you may want 5 dedicated master nodes so you can handle losing 2 of them without affecting the cluster (a strict majority of master eligible nodes always need to be available for the cluster to function properly).

1 Like

Thanks
I have a simple question
We know that one of the master node functions is to create and delete indexes
Suppose I have 160 data nodes
Will 3 master nodes be enough for 160 data nodes?
Thank you for your help

I mean, is there a relationship between the number of data node and master node

No, there is not.

Thank you very much

I have one last question
How many Ingest nodes and Client nodes do I need for n number of data knots
Does the number of data node relate to the number of Client nodes and Ingest nodes?

That depends on the data and workload. It is something you will need to test and find out.

1 Like

Are there reference sources for that?

Have a look at this blog post. The number of data nodes depends on daily ingest volume as well as the retention period. I would expect the number of ingest nodes required to be proportional to the ingested voilume per day and also depend on the complexity of your pipeline(s). This can vary a lot so you need to test to see how it applies to your use case.

1 Like

For future reference, this is covered here in the reference manual:

However, it is good practice to limit the number of master-eligible nodes in the cluster to three. Master nodes do not scale like other node types since the cluster always elects just one of them as the master of the cluster. If there are too many master-eligible nodes then master elections may take a longer time to complete.

Note also this sizing guidance:

Aim for 3000 indices or fewer per GB of heap memory on each master node


This is not quite true, at least not without significant extra complexity in the orchestration system that surrounds Elasticsearch. By default with 5 masters there are some sequences of events that require 4 of those masters to be active. See these docs:

NOTE: If cluster.auto_shrink_voting_configuration is set to true (which is the default and recommended value) and there are at least three master-eligible nodes in the cluster, Elasticsearch remains capable of processing cluster state updates as long as all but one of its master-eligible nodes are healthy.

There are situations in which Elasticsearch might tolerate the loss of multiple nodes, but this is not guaranteed under all sequences of failures. If the cluster.auto_shrink_voting_configuration setting is false, you must remove departed nodes from the voting configuration manually. Use the voting exclusions API to achieve the desired level of resilience.

TLDR we recommend three master-eligible nodes in all circumstances.

3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.