We upgraded Elasticsearch cluster from version 7.13.3 to version 7.16.1 using Rolling upgrade. We upgraded each node one by one, considering non-master nodes first and master nodes in last.
The cluster got upgraded but, during upgrade we faced one issue. Whichever node gets upgraded with the newer version, that node gets started but it always shows following warning in logs:
{"type": "server", "timestamp": "2021-12-21T08:08:30,171Z", "level": "INFO", "component": "o.e.x.m.e.l.LocalExporter", "[cluster.name](http://cluster.name/)": "testing_cluster", "[node.name](http://node.name/)": "node-name-1", "message": "waiting for elected master node [{node-master-1}{sfS9I5_fS-e_x32U-rUoVQ}{aYSb0200TRakieoww-UzQ}{[node-master-1.domain.com](http://node-master-1.domain.com/)}{10.66.12.211:9300}{lmr}{ml.machine_memory=8201854976, rack=r1, ml.max_open_jobs=512, xpack.installed=true, ml.max_jvm_size=6442450944, transform.node=false}] to setup local exporter [default_local] (does it have x-pack installed?)", "cluster.uuid": "aksjkj83943ldkdsdssdj3322221", "[node.id](http://node.id/)": "ASDQWE11233ZXC1233222" }
Due to this, all the upgraded nodes with newer version were not able to join the cluster as master nodes are not upgraded yet. As master nodes need to be upgraded in last as suggested by Elasticsearch Rolling upgrade document. So, at one point all the nodes except master nodes, were upgraded and running but they were not able to join the cluster.
Once the master nodes got upgraded in last, this issue got resolved and all the other non-master nodes were able to join the cluster.
So do we need to upgrade master nodes first (which is not suggested by Rolling upgrade document) or there are other things need to consider so that newly upgraded node always joins cluster after upgrade (till the master nodes get upgrade).
The log message you shared indicates the opposite of this. It tells us that the node knows that node-master-1 is the elected master of the cluster which means it must have successfully joined. It's only an INFO message and just says that monitoring isn't quite configured right.
What is making you think that nodes are unable to join the cluster?
@DavidTurner Thanks for a quick reply. We checked in Kibana and there upgraded nodes were not showing, so guessed that nodes are not connecting to master nodes.
Regarding monitoring, we checked cluster settings and following are the values :
To confirm which nodes are in the cluster, check GET _cat/nodes. If a node doesn't join the cluster then it will describe the problem in the logs, reporting something like master not discovered yet every 10s and likely other useful detail too.
If the node is listed in GET _cat/nodes then the problem is not that it can't join the cluster. Not all Kibana UIs will show all nodes, it depends from where the screen at which you were looking is getting its data. I can possibly help with questions about nodes not joining the cluster but I'm not the right person to ask about other UI issues or configuring monitoring correctly. It would be best to open a separate topic about that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.