Old Version:
Elasticsearch version : 5.2.1
JVM version : 1.8.0_71
OS version : centos 7.3
New Verision:
Elasticsearch version : 5.6.5
JVM version : 1.8.0_131
OS version : centos 7.3
The details:
I have six elasitcsearch clusters of 20 nodes. Plan to upgrade 5.2.1 to 5.26 though the rolling upgrade , reference the https://www.elastic.co/guide/en/elasticsearch/reference/5.6/rolling-upgrades.html.
Five clusters had successfully upgraded ,but one of six elasticsearch clusters can not be upgraded. When one node of the problem cluster is to upgrade ,it cant't join the old cluster.
my english is poor,so sorry.
before upgrade:
curl localhost:9200
{
"name" : "xg-ops-elk-javaes-mgt-3",
"cluster_name" : "xg-ops-elk-javaes-cluster",
"cluster_uuid" : "W_xR97SqQ66yHEq9bQUDZQ",
"version" : {
"number" : "5.2.1",
"build_hash" : "db0d481",
"build_date" : "2017-02-09T22:05:32.386Z",
"build_snapshot" : false,
"lucene_version" : "6.4.1"
},
"tagline" : "You Know, for Search"
}
After upgrade and restart:
curl localhost:9200
{
"name" : "xg-ops-elk-javaes-mgt-3",
"cluster_name" : "xg-ops-elk-javaes-cluster",
"cluster_uuid" : "_na_",
"version" : {
"number" : "5.6.5",
"build_hash" : "6a37571",
"build_date" : "2017-12-04T07:50:10.466Z",
"build_snapshot" : false,
"lucene_version" : "6.6.1"
},
"tagline" : "You Know, for Search"
}
the log is :
[2018-01-14T16:33:13,125][INFO ][o.e.n.Node ] [xg-ops-elk-javaes-mgt-3] initialized
[2018-01-14T16:33:13,125][INFO ][o.e.n.Node ] [xg-ops-elk-javaes-mgt-3] starting ...
[2018-01-14T16:33:13,265][INFO ][o.e.t.TransportService ] [xg-ops-elk-javaes-mgt-3] publish_address {10.0.23.55:9300}, bound_addresses {127.0.0.1:9300}, {10.0.23.55:9300}
[2018-01-14T16:33:13,274][INFO ][o.e.b.BootstrapChecks ] [xg-ops-elk-javaes-mgt-3] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2018-01-14T16:33:23,746][INFO ][o.e.d.z.ZenDiscovery ] [xg-ops-elk-javaes-mgt-3] failed to send join request to master [{xg-ops-elk-javaes-mgt-2}{02U7L_IMSxSnZVyKNGyPKg}{sWovufJERle5VTH1o3s4ww}{10.0.19.68}{10.0.19.68:9300}], reason [RemoteTransportException[[xg-ops-elk-javaes-mgt-2][10.0.19.68:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]]; nested: NullPointerException; ]
[2018-01-14T16:33:34,121][INFO ][o.e.d.z.ZenDiscovery ] [xg-ops-elk-javaes-mgt-3] failed to send join request to master [{xg-ops-elk-javaes-mgt-2}{02U7L_IMSxSnZVyKNGyPKg}{sWovufJERle5VTH1o3s4ww}{10.0.19.68}{10.0.19.68:9300}], reason [RemoteTransportException[[xg-ops-elk-javaes-mgt-2][10.0.19.68:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]]; nested: NullPointerException; ]
[2018-01-14T16:33:43,293][WARN ][o.e.n.Node ] [xg-ops-elk-javaes-mgt-3] timed out while waiting for initial discovery state - timeout: 30s
[2018-01-14T16:33:43,303][INFO ][o.e.h.n.Netty4HttpServerTransport] [xg-ops-elk-javaes-mgt-3] publish_address {10.0.23.55:9200}, bound_addresses {127.0.0.1:9200}, {10.0.23.55:9200}
[2018-01-14T16:33:43,303][INFO ][o.e.n.Node ] [xg-ops-elk-javaes-mgt-3] started
[2018-01-14T16:33:44,475][INFO ][o.e.d.z.ZenDiscovery ] [xg-ops-elk-javaes-mgt-3] failed to send join request to master [{xg-ops-elk-javaes-mgt-2}{02U7L_IMSxSnZVyKNGyPKg}{sWovufJERle5VTH1o3s4ww}{10.0.19.68}{10.0.19.68:9300}], reason [RemoteTransportException[[xg-ops-elk-javaes-mgt-2][10.0.19.68:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]]; nested: NullPointerException; ]
[2018-01-14T16:33:49,162][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [xg-ops-elk-javaes-mgt-3] no known master node, scheduling a retry
.....
[2018-01-14T16:40:59,129][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [xg-ops-elk-javaes-mgt-3] no known master node, scheduling a retry
[2018-01-14T16:41:08,687][INFO ][o.e.d.z.ZenDiscovery ] [xg-ops-elk-javaes-mgt-3] failed to send join request to master [{xg-ops-elk-javaes-mgt-2}{02U7L_IMSxSnZVyKNGyPKg}{sWovufJERle5VTH1o3s4ww}{10.0.19.68}{10.0.19.68:9300}], reason [RemoteTransportException[[xg-ops-elk-javaes-mgt-2][10.0.19.68:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[xg-ops-elk-javaes-mgt-3][10.0.23.55:9300][internal:discovery/zen/join/validate]]; nested: NullPointerException; ]
[2018-01-14T16:41:09,134][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [xg-ops-elk-javaes-mgt-3] no known master node, scheduling a retry
[2018-01-14T16:41:09,142][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [xg-ops-elk-javaes-mgt-3] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2018-01-14T16:41:09,142][WARN ][r.suppressed ] path: /_cluster/health, params: {level=indices}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:209) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:311) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:238) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:1056) [elasticsearch-5.6.5.jar:5.6.5]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.5.jar:5.6.5]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Please help me, i can't how to slove the issue now ,thanks.