Unable to establish connection to master when upgrading to 6.7.0

iremmats · April 1, 2019, 6:28pm

Background: We had a 7 node cluster running 6.4.2 which we want to upgrade to 6.7.0. Connecting new nodes to this cluster with version 6.7.0 did not work. Connecting nodes of version 6.5.4 and 6.6.2 worked fine. We went for upgrading the cluster to 6.6.2 first.

So now we have a 6.6.2 cluster and we are unable to add 6.7.0 nodes to it. Adding more nodes of version 6.6.2 works fine.

Logs from nodes trying to connect to the existing cluster
[2019-04-01T18:06:14,585][INFO ][o.e.x.s.a.s.FileRolesStore] [elasticsearch4-1] parsed [0] roles from file [/usr/share/elasticsearch/config/roles.yml]
[2019-04-01T18:06:15,362][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [elasticsearch4-1] [controller/71] [Main.cc@109] controller (64 bit): Version 6.7.0 (Build d74ae2ac01b10d) Copyright (c) 2019 Elasticsearch BV
[2019-04-01T18:06:16,372][INFO ][o.e.d.DiscoveryModule ] [elasticsearch4-1] using discovery type [zen] and host providers [settings]
[2019-04-01T18:06:17,386][INFO ][o.e.n.Node ] [elasticsearch4-1] initialized
[2019-04-01T18:06:17,386][INFO ][o.e.n.Node ] [elasticsearch4-1] starting ...
[2019-04-01T18:06:17,546][INFO ][o.e.t.TransportService ] [elasticsearch4-1] publish_address {10.244.14.5:9300}, bound_addresses {0.0.0.0:9300}
[2019-04-01T18:06:17,562][INFO ][o.e.b.BootstrapChecks ] [elasticsearch4-1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-04-01T18:06:47,613][WARN ][o.e.n.Node ] [elasticsearch4-1] timed out while waiting for initial discovery state - timeout: 30s
[2019-04-01T18:06:47,626][INFO ][o.e.h.n.Netty4HttpServerTransport] [elasticsearch4-1] publish_address {10.244.14.5:9200}, bound_addresses {0.0.0.0:9200}
[2019-04-01T18:06:47,627][INFO ][o.e.n.Node ] [elasticsearch4-1] started
[2019-04-01T18:07:18,040][INFO ][o.e.c.s.ClusterApplierService] [elasticsearch4-1] detected_master {elasticsearch3-6}{CaAA7Em8ShqkRKhnHDURuw}{9ML67vr4RgCRFUQN9DUMug}{10.244.0.19}{10.244.0.19:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, added {{elasticsearch3-3}{HZAa-5edRU2W9M5vqO0n5Q}{WhzpFvNbSsmqCffJAqbYhw}{10.244.8.25}{10.244.8.25:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{elasticsearch3-1}{K9zC0fdZQAGGs4gPgGc1pw}{fEyZlApVSVKTkyv0aci98Q}{10.244.11.30}{10.244.11.30:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{elasticsearch3-5}{gV8YEzhHTYm-GlOSHuwp4Q}{ETzm1qu7Qgiae377DY9eLA}{10.244.9.24}{10.244.9.24:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{elasticsearch3-2}{5vTRWLrvTSW6Dptzy4hr4g}{LQa4EDL3QbmNMHPRsFG-8w}{10.244.10.24}{10.244.10.24:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{elasticsearch3-6}{CaAA7Em8ShqkRKhnHDURuw}{9ML67vr4RgCRFUQN9DUMug}{10.244.0.19}{10.244.0.19:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{elasticsearch3-0}{eHjpveEeTqiOTd7KwqppuA}{NwY-XamHTDa1eq2vQegAzQ}{10.244.5.25}{10.244.5.25:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},{elasticsearch3-4}{yn6WcsxaSQOQfbIu_98MYg}{KVQtdkmDSEiC0SuZOyOeLA}{10.244.1.35}{10.244.1.35:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true},}, reason: apply cluster state (from master [master {elasticsearch3-6}{CaAA7Em8ShqkRKhnHDURuw}{9ML67vr4RgCRFUQN9DUMug}{10.244.0.19}{10.244.0.19:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true} committed version [79304]])
[2019-04-01T18:07:23,803][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [elasticsearch4-1] Failed to clear cache for realms []
[2019-04-01T18:07:23,805][INFO ][o.e.x.s.a.TokenService ] [elasticsearch4-1] refresh keys
[2019-04-01T18:07:23,993][INFO ][o.e.x.s.a.TokenService ] [elasticsearch4-1] refreshed keys
[2019-04-01T18:07:24,521][INFO ][o.e.l.LicenseService ] [elasticsearch4-1] license [62fcc1be-002c-4c43-8c21-912ca5be6986] mode [basic] - valid
[2019-04-01T18:07:34,639][INFO ][o.e.d.z.ZenDiscovery ] [elasticsearch4-1] master_left [{elasticsearch3-6}{CaAA7Em8ShqkRKhnHDURuw}{9ML67vr4RgCRFUQN9DUMug}{10.244.0.19}{10.244.0.19:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2019-04-01T18:07:34,640][WARN ][o.e.d.z.ZenDiscovery ] [elasticsearch4-1] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: nodes:
{elasticsearch3-3}{HZAa-5edRU2W9M5vqO0n5Q}{WhzpFvNbSsmqCffJAqbYhw}{10.244.8.25}{10.244.8.25:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
{elasticsearch4-1}{paj8Zq0XQPyJ7E1asBNKhw}{9vi3p_E1QPqDNIpLoy_M6A}{10.244.14.5}{10.244.14.5:9300}{ml.machine_memory=16820711424, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, local
{elasticsearch3-2}{5vTRWLrvTSW6Dptzy4hr4g}{LQa4EDL3QbmNMHPRsFG-8w}{10.244.10.24}{10.244.10.24:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
{elasticsearch3-0}{eHjpveEeTqiOTd7KwqppuA}{NwY-XamHTDa1eq2vQegAzQ}{10.244.5.25}{10.244.5.25:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
{elasticsearch3-4}{yn6WcsxaSQOQfbIu_98MYg}{KVQtdkmDSEiC0SuZOyOeLA}{10.244.1.35}{10.244.1.35:9300}{ml.machine_memory=16820711424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}

The joining and leaving keeps repeating.

The master node reports the following errors:
[2019-04-01T18:24:14,390][WARN ][o.e.t.TcpTransport ] [elasticsearch3-6] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.244.0.19:35870, remoteAddress=10.244.13.5/10.244.13.5:9300}], closing connection
java.lang.IllegalStateException: Message not fully read (response) for requestId [5725790], handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler/org.elasticsearch.action.support.nodes.TransportNodesAction$AsyncAction$1@16dbe31e], error [false]; resetting
at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1137) ~[elasticsearch-6.6.2.jar:6.6.2]
at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:914) [elasticsearch-6.6.2.jar:6.6.2]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:53) [transport-netty4-client-6.6.2.jar:6.6.2]

DavidTurner · April 2, 2019, 8:48am

Hi @iremmats, thanks for the report. Are you running Elasticsearch using the official Docker images? If so, I think the issue is https://github.com/elastic/elasticsearch/issues/40511. If you set

logger.org.elasticsearch.action: DEBUG

then we should get a stack trace on the 6.6.2 node to confirm this.

iremmats · April 2, 2019, 8:49am

Yes we run the official containers.

Found some more on this.

"caused_by" : {
"type" : "transport_serialization_exception",
"reason" : "Failed to deserialize response from handler [org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler]",
"caused_by" : {
"type" : "illegal_state_exception",
"reason" : "unexpected distribution type [docker]; your distribution is broken"
}

Looking at the source code at https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/Build.java tells me that 28 days ago the distribution type docker was added. So apparently the 6.6.2 docker image has distribution type "tar" and does not recognize distribution type "docker" that the 6.7.0 containers has and throws an error.

iremmats · April 2, 2019, 8:51am

I read the GitHub issue now. Its spot on.

system · April 30, 2019, 8:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch upgrade from 6.4.1 to 6.7.0, upgraded node is unable to join the cluster Elasticsearch	11	3598	May 8, 2019
Upgraded node to 6.7.0 => stuck in master_left/detected_master loop Elasticsearch	4	558	May 2, 2019
Upgrading from 6.6.8 to 7.5.0 having all sorts of problems - Please help Elasticsearch	15	1346	January 9, 2020
Upgraded node not rejoining non upgraded cluster 6.5.1 to 6.7 upgrade - IOException[Invalid string; unexpected character: 255 hex: ff] Elasticsearch	27	1501	April 29, 2019
Upgrading from ES 6.8 to 7.17 and got MasterNotDiscoveredException Elasticsearch docker	1	441	November 4, 2022

Unable to establish connection to master when upgrading to 6.7.0

Related topics