We are trying to host an elasticsearch cluster with 3 Master Nodes and 6 Data Nodes. In order to establish a cluster, we are providing discovery.seed_hosts and cluster.initial_master_nodes in the elasticsearch.yml file.
In discovery.seed_hosts, we are currently providing hostnames for all the nodes (master as well as data).
In cluster.initial_master_nodes, we are currently providing node.name info for all the master nodes.
The question here is, do we really need to include data nodes in the discovery.seed_hosts? or is it okay if we just specify only master nodes?
You should only refer to the master nodes in discovery.seed_hosts. From the docs:
... you must use the discovery.seed_hosts setting to provide a list of other nodes in the cluster that are master-eligible [...] This setting should normally contain the addresses of all the master-eligible nodes in the cluster.
@DavidTurner, it is a bit confusing then. There seems to be very less difference between discovery.seed_hosts and cluster.initial_master_nodes.
For production mode:
seed_hosts should contain the hostnames or the IPs of all master eligible nodes and
initial_master_nodes should contain the node.names of all master eligible nodes.
Is that a correct understanding?
And if yes, then why can't one be derived from the other automatically by ES since the mapping is already known to ES?
Thinking aloud, the property names like cluster.master_hostnames and cluster.master_names might be more clear perhaps.
There's a superficial similarity but they're really very different settings.
discovery.seed_hosts is about discovery, i.e. finding the master nodes, so belongs in the discovery.* settings namespace. It must be set on every node whether master-eligible or not because every node must perform discovery. It should be kept up to date as the cluster evolves, but it tolerates mistakes (particularly extra nodes) and need not be precisely synchronised across all nodes. It can involve an external service (e.g. DNS) which may not give wholly consistent answers, and is just one of a number of pluggable mechanisms for discovering the master nodes in a cluster.
cluster.initial_master_nodes is about cluster bootstrapping, i.e. the first election in the cluster, so belongs in the cluster.* settings namespace. It need not be set on master-ineligible nodes because these nodes do not take part in the first election. It absolutely must not be adjusted as the cluster evolves and can be removed once the first election has taken place. It does not tolerate mistakes and must be precisely synchronised across all nodes on which it is set. It must not involve external services like DNS for consistency reasons, and it cannot be supplied by a plugin.
The mapping between master names and addresses is not "already known" to ES and cannot be automatically discovered. A node may ask another node for its name but only once it knows its address, but addresses do not uniquely and consistently identify nodes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.