Node management and values

I'm new to elasticsearch, and I've tried to look up FAQs and things but no luck so far, so.... here goes.

We have a 3+10 cluster I get to inherit administration of.
3 master nodes, 10 data.
I need to add some new, higher disk capacity nodes into the cluster. and then eventually cycle through all the old nodes, upgrading disks for each of them.
its a production cluster, so.... yeah :-/

I notice that Every single node currently has an elasticsearch.yml file. They are practically identical across all nodes. Each define

  expected_nodes: 10

So.. when I want to add nodes, do I need to go change the file on every single node, and make sure it has the correct value?

secondly.. every single yml config file has

    minimum_master_nodes: 2
             - host01
             - host02

Since this is unicast, not multicast.. does this mean that, again, I have to go to every single host and change the file to identify new hosts that I want to add to the cluster?

I'm confused, because all the web search I do about "how to add a node to elasticsearch cluster" just say to create the config on the new node, start the new node, and it just AutoMagically happens. There's no mention of having to go edit files on any of the existing nodes.

but if so, what are the values in the elasticsearch.yml file for???

You don't have to, no. The gateway.expected_nodes setting is described here and can safely be set to a number lower than the number of nodes in the cluster. It is optional, only has an effect on the master-eligible nodes, and only when completely restarting the cluster. I think I would leave it at 10 for now and then adjust it later if needed.

Really you should only be listing the master-eligible nodes under That way you don't need to adjust it when you add a master-ineligible node like a data node.

so.. in theory, I should be able to add the new node, listing just the masters.
but .. it isnt working.
and its giving me a bogus "no route to host" error, when tcpdump and telnet both says that traffic IS actually flowing from the new node to the current master node.

Log file entries:

NFO ][node ] [oc2esdata11-oc2] starting ...
[2019-09-09 16:11:38,835][INFO ][transport ] [oc2esdata11-oc2] publish_address {}, bound_addresses {}
[2019-09-09 16:11:38,843][INFO ][discovery ] [oc2esdata11-oc2] elasticsearch_logstash/eKYDs3CIRQOaniiJyLVGUg
[2019-09-09 16:11:41,901][INFO ][discovery.zen ] [oc2esdata11-oc2] failed to send join request to master [{oc2esnode02-oc2}{SLDhR7foQwuCpwp5NsecUg}{}{}{rack=vm, data=false, master=true}], reason [RemoteTransportException[[oc2esnode02-oc2][][internal:discovery/zen/join]]; nested: ConnectTransportException[[oc2esdata11-oc2][] connect_timeout[30s]]; nested: NotSerializableExceptionWrapper[no_route_to_host_exception: No route to host]; ]

telnet test:

[root@oc2esdata11 oc2]# telnet 9300
Connected to
Escape character is '^]'.

PS: googled a "fix" for this, but new node does have this defined already:
host: enp4s0f0:ipv4

and that IS the actual active network interface name.
enp4s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

oh. I thought you only needed port 9300 to the master. but apparently you also need a connection back on 9300,even if NOT a master.
So I guess I'll have to get the firewalling tweaked for that!

1 Like

Yes that's right, this exception is the master trying (and failing) to connect back to the new node, which means that the new node got through to the master ok.

clearer error messages would have been nice :-/