Background: We had a 7 node cluster running 6.4.2 which we want to upgrade to 6.7.0. Connecting new nodes to this cluster with version 6.7.0 did not work. Connecting nodes of version 6.5.4 and 6.6.2 worked fine. We went for upgrading the cluster to 6.6.2 first.
So now we have a 6.6.2 cluster and we are unable to add 6.7.0 nodes to it. Adding more nodes of version 6.6.2 works fine.
Logs from nodes trying to connect to the existing cluster
[2019-04-01T18:06:14,585][INFO ][o.e.x.s.a.s.FileRolesStore] [elasticsearch4-1] parsed [0] roles from file [/usr/share/elasticsearch/config/roles.yml]
[2019-04-01T18:06:15,362][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [elasticsearch4-1] [controller/71] [Main.cc@109] controller (64 bit): Version 6.7.0 (Build d74ae2ac01b10d) Copyright (c) 2019 Elasticsearch BV
[2019-04-01T18:06:16,372][INFO ][o.e.d.DiscoveryModule ] [elasticsearch4-1] using discovery type [zen] and host providers [settings]
[2019-04-01T18:06:17,386][INFO ][o.e.n.Node ] [elasticsearch4-1] initialized
[2019-04-01T18:06:17,386][INFO ][o.e.n.Node ] [elasticsearch4-1] starting ...
[2019-04-01T18:06:17,546][INFO ][o.e.t.TransportService ] [elasticsearch4-1] publish_address {10.244.14.5:9300}, bound_addresses {0.0.0.0:9300}
[2019-04-01T18:06:17,562][INFO ][o.e.b.BootstrapChecks ] [elasticsearch4-1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-04-01T18:06:47,613][WARN ][o.e.n.Node ] [elasticsearch4-1] timed out while waiting for initial discovery state - timeout: 30s
[2019-04-01T18:06:47,626][INFO ][o.e.h.n.Netty4HttpServerTransport] [elasticsearch4-1] publish_address {10.244.14.5:9200}, bound_addresses {0.0.0.0:9200}
[2019-04-01T18:06:47,627][INFO ][o.e.n.Node ] [elasticsearch4-1] started
[2019-04-01T18:07:18,040][INFO ][o.e.c.s.ClusterApplierService] [elasticsearch4-1] detected_master {elasticsearch3-6}{CaAA7Em8ShqkRKhnHDURuw}{9ML67vr4RgCRFUQN9DUMug}{10.244.0.19}{10.244.0.19:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, added {{elasticsearch3-3}{HZAa-5edRU2W9M5vqO0n5Q}{WhzpFvNbSsmqCffJAqbYhw}{10.244.8.25}{10.244.8.25:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{elasticsearch3-1}{K9zC0fdZQAGGs4gPgGc1pw}{fEyZlApVSVKTkyv0aci98Q}{10.244.11.30}{10.244.11.30:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{elasticsearch3-5}{gV8YEzhHTYm-GlOSHuwp4Q}{ETzm1qu7Qgiae377DY9eLA}{10.244.9.24}{10.244.9.24:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{elasticsearch3-2}{5vTRWLrvTSW6Dptzy4hr4g}{LQa4EDL3QbmNMHPRsFG-8w}{10.244.10.24}{10.244.10.24:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{elasticsearch3-6}{CaAA7Em8ShqkRKhnHDURuw}{9ML67vr4RgCRFUQN9DUMug}{10.244.0.19}{10.244.0.19:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{elasticsearch3-0}{eHjpveEeTqiOTd7KwqppuA}{NwY-XamHTDa1eq2vQegAzQ}{10.244.5.25}{10.244.5.25:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{elasticsearch3-4}{yn6WcsxaSQOQfbIu_98MYg}{KVQtdkmDSEiC0SuZOyOeLA}{10.244.1.35}{10.244.1.35:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {elasticsearch3-6}{CaAA7Em8ShqkRKhnHDURuw}{9ML67vr4RgCRFUQN9DUMug}{10.244.0.19}{10.244.0.19:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [79304]])
[2019-04-01T18:07:23,803][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [elasticsearch4-1] Failed to clear cache for realms []
[2019-04-01T18:07:23,805][INFO ][o.e.x.s.a.TokenService ] [elasticsearch4-1] refresh keys
[2019-04-01T18:07:23,993][INFO ][o.e.x.s.a.TokenService ] [elasticsearch4-1] refreshed keys
[2019-04-01T18:07:24,521][INFO ][o.e.l.LicenseService ] [elasticsearch4-1] license [62fcc1be-002c-4c43-8c21-912ca5be6986] mode [basic] - valid
[2019-04-01T18:07:34,639][INFO ][o.e.d.z.ZenDiscovery ] [elasticsearch4-1] master_left [{elasticsearch3-6}{CaAA7Em8ShqkRKhnHDURuw}{9ML67vr4RgCRFUQN9DUMug}{10.244.0.19}{10.244.0.19:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2019-04-01T18:07:34,640][WARN ][o.e.d.z.ZenDiscovery ] [elasticsearch4-1] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: nodes:
{elasticsearch3-3}{HZAa-5edRU2W9M5vqO0n5Q}{WhzpFvNbSsmqCffJAqbYhw}{10.244.8.25}{10.244.8.25:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
{elasticsearch4-1}{paj8Zq0XQPyJ7E1asBNKhw}{9vi3p_E1QPqDNIpLoy_M6A}{10.244.14.5}{10.244.14.5:9300}{ml.machine_memory=16820711424, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, local
{elasticsearch3-2}{5vTRWLrvTSW6Dptzy4hr4g}{LQa4EDL3QbmNMHPRsFG-8w}{10.244.10.24}{10.244.10.24:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
{elasticsearch3-0}{eHjpveEeTqiOTd7KwqppuA}{NwY-XamHTDa1eq2vQegAzQ}{10.244.5.25}{10.244.5.25:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
{elasticsearch3-4}{yn6WcsxaSQOQfbIu_98MYg}{KVQtdkmDSEiC0SuZOyOeLA}{10.244.1.35}{10.244.1.35:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
The joining and leaving keeps repeating.
The master node reports the following errors:
[2019-04-01T18:24:14,390][WARN ][o.e.t.TcpTransport ] [elasticsearch3-6] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.244.0.19:35870, remoteAddress=10.244.13.5/10.244.13.5:9300}], closing connection
java.lang.IllegalStateException: Message not fully read (response) for requestId [5725790], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1@16dbe31e], error [false]; resetting
at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1137) ~[elasticsearch-6.6.2.jar:6.6.2]
at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:914) [elasticsearch-6.6.2.jar:6.6.2]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) [transport-netty4-client-6.6.2.jar:6.6.2]