Node couldn't rejoin cluster because "version not supported"

mikewillis · February 25, 2021, 11:35am

I've got an Elasticsearch cluster of four nodes. Three can hold data and are master eligible. One cannot hold data, isn't master eligible and has Kibana on it. Two of the master eligible nodes became unhappy resulting in a brief period when there was no cluster. The cluster sorted itself out back to Green, but the Kibana node (agdud) got left out. In the Elasticsearch log on agdud there are lots of instances of this:

[2021-02-24T14:42:57,977][INFO ][o.e.c.c.JoinHelper       ] [agdud] failed to join {agdub}{gFWyyiOTSl-OVNky9YQmZw}{fuXPwsw-RIy7Nv6BEBIcMw}{10.70.13.42}{10.70.13.42:9300}{cdhilmrstw}{ml.machine_memory=1927577600, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=1073741824, transform.node=true} with JoinRequest{sourceNode={agdud}{cVGPTdpeRneDaBqu5aCDbw}{VP8A1k8LTziq2-tuR_I3zQ}{10.70.13.78}{10.70.13.78:9300}{ilr}{ml.machine_memory=6087639040, xpack.installed=true, transform.node=false, ml.max_open_jobs=20}, minimumTerm=41, optionalJoin=Optional[Join{term=41, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={agdud}{cVGPTdpeRneDaBqu5aCDbw}{VP8A1k8LTziq2-tuR_I3zQ}{10.70.13.78}{10.70.13.78:9300}{ilr}{ml.machine_memory=6087639040, xpack.installed=true, transform.node=false, ml.max_open_jobs=20}, targetNode={agdub}{gFWyyiOTSl-OVNky9YQmZw}{fuXPwsw-RIy7Nv6BEBIcMw}{10.70.13.42}{10.70.13.42:9300}{cdhilmrstw}{ml.machine_memory=1927577600, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=1073741824, transform.node=true}}]}
org.elasticsearch.transport.RemoteTransportException: [agdub][10.70.13.42:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: index [ilm-history-3-000004/vpxF3Wm2QeePoBbqNtHHCA] version not supported: 7.11.0 the node version is: 7.10.2

agdub is the master node. The index mentioned is always ilm-history-3-000004.

All the data nodes were running Elasticsearch 7.11.0 but agdud was still running 7.10.2. Updating it to 7.11.0 solved the problem. (It would have automatically got updated to 7.11.0 about 12 hours later.) The creation date of the ilm-history-3-000004 index is after all the data nodes were updated to 7.11.0.

Can someone explain why agdud wasn't allowed to join the cluster while running 7.10.2? It seems like it's because the ilm-history-3-000004 index had been created when the master node and all the data nodes were running 7.11.0, but agdud doesn't hold data and apparently it wasn't a problem that it was running a slightly older version when the index was created.

DavidTurner · February 25, 2021, 12:30pm

Yes, the only legitimate reason for having a mix of nodes in your cluster is if you are in the middle of a rolling upgrade, and the rolling upgrade docs answer your questions:

Running multiple versions of Elasticsearch in the same cluster beyond the duration of an upgrade is not supported,

and in the IMPORTANT bit at the bottom:

In the unlikely case of a network malfunction during the upgrade process that isolates all remaining old nodes from the cluster, you must take the old nodes offline and upgrade them to enable them to join the cluster.

mikewillis · February 25, 2021, 1:21pm

Really useful, thanks. I hadn't thought of it as a rolling upgrade scenario, but that's really what was effectively happening.

system · March 25, 2021, 1:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Node unable to join elasticsearch cluster Elasticsearch	5	388	June 29, 2018
Newer elasticsearch instance cannot join the cluster Elasticsearch	4	417	July 6, 2017
Running with es1.7 and es 2.1 in ONE cluster Elasticsearch	3	513	July 5, 2017
Cluster Elasticsearch of two version Elasticsearch	3	396	August 6, 2018
Errors upgrading from es 0.17.7 to 0.18.2 Elasticsearch	3	339	July 6, 2017

Node couldn't rejoin cluster because "version not supported"

Related topics