I've got an Elasticsearch cluster of four nodes. Three can hold data and are master eligible. One cannot hold data, isn't master eligible and has Kibana on it. Two of the master eligible nodes became unhappy resulting in a brief period when there was no cluster. The cluster sorted itself out back to Green, but the Kibana node (agdud) got left out. In the Elasticsearch log on agdud there are lots of instances of this:
[2021-02-24T14:42:57,977][INFO ][o.e.c.c.JoinHelper ] [agdud] failed to join {agdub}{gFWyyiOTSl-OVNky9YQmZw}{fuXPwsw-RIy7Nv6BEBIcMw}{10.70.13.42}{10.70.13.42:9300}{cdhilmrstw}{ml.machine_memory=1927577600, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=1073741824, transform.node=true} with JoinRequest{sourceNode={agdud}{cVGPTdpeRneDaBqu5aCDbw}{VP8A1k8LTziq2-tuR_I3zQ}{10.70.13.78}{10.70.13.78:9300}{ilr}{ml.machine_memory=6087639040, xpack.installed=true, transform.node=false, ml.max_open_jobs=20}, minimumTerm=41, optionalJoin=Optional[Join{term=41, lastAcceptedTerm=0, lastAcceptedVersion=0, sourceNode={agdud}{cVGPTdpeRneDaBqu5aCDbw}{VP8A1k8LTziq2-tuR_I3zQ}{10.70.13.78}{10.70.13.78:9300}{ilr}{ml.machine_memory=6087639040, xpack.installed=true, transform.node=false, ml.max_open_jobs=20}, targetNode={agdub}{gFWyyiOTSl-OVNky9YQmZw}{fuXPwsw-RIy7Nv6BEBIcMw}{10.70.13.42}{10.70.13.42:9300}{cdhilmrstw}{ml.machine_memory=1927577600, ml.max_open_jobs=20, xpack.installed=true, ml.max_jvm_size=1073741824, transform.node=true}}]}
org.elasticsearch.transport.RemoteTransportException: [agdub][10.70.13.42:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: index [ilm-history-3-000004/vpxF3Wm2QeePoBbqNtHHCA] version not supported: 7.11.0 the node version is: 7.10.2
agdub is the master node. The index mentioned is always ilm-history-3-000004.
All the data nodes were running Elasticsearch 7.11.0 but agdud was still running 7.10.2. Updating it to 7.11.0 solved the problem. (It would have automatically got updated to 7.11.0 about 12 hours later.) The creation date of the ilm-history-3-000004 index is after all the data nodes were updated to 7.11.0.
Can someone explain why agdud wasn't allowed to join the cluster while running 7.10.2? It seems like it's because the ilm-history-3-000004 index had been created when the master node and all the data nodes were running 7.11.0, but agdud doesn't hold data and apparently it wasn't a problem that it was running a slightly older version when the index was created.