Master node can not rejoin cluster after restart on es2.0.0. but other nodes can rejoin cluster after restart #15916


(Wisre) #1

i have a es cluster with ten nodes five dedicated master[A,B,C,D,E] and five datanode,when i retsart
the master A ,other dedicated master elect another master B,but when the A start,it cannot find master B.but B can find A,and put A into the cluster.

the error like this

[2016-01-12 11:39:45,812][DEBUG][action.admin.cluster.state] [efe-es-03] no known master node, scheduling a retry
[2016-01-12 11:39:45,891][DEBUG][action.admin.cluster.state] [efe-es-03] no known master node, scheduling a retry
[2016-01-12 11:39:46,490][INFO ][rest.suppressed ] /.kibana/config/_search Params: {index=.kibana, type=config}
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:154)
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:144)
the new master add A into cluster

[2016-01-12 10:57:49,394][INFO ][cluster.service ] [efe-es-03] added {{efe-es-01}{9H7bnUbASVerOMdEOlWikg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true},}, reason: zen-disco-join(join from node[{efe-es-01}{9H7bnUbASVerOMdEOlWikg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true}])
but A can't get nodes info others can get nodes info including A

curl localhost:8080/_cat/nodes


(David Pilato) #2

Logs are in a strange order.

I think you did not copy all logs.

Did you run the upgrade plugin first?

May be you have to remove .kibana indices?


(Wisre) #3

no i just want to roll restart node to add some params,when i restart other nodes which are not master,they can rejoin cluster well. but afteri restart master,then it gone bad,it can't rejoin master. i have delete all data on the bad node, but i can't join too. my cluster work well without the bad node


(Wisre) #4

after i restart all the node in my cluster,i hava this error,the cluster is instability

[2016-01-12 16:23:48,942][INFO ][cluster.service ] [bjx-efe-es-07] master {new {efe-es-01}{1bVuHy8NQv-P1yQPSt2AMg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true}}, removed {{bjx-efe-es-06}{6EhE00oTQ2u9jVi1HtcNFQ}{10.39.128.39}{10.39.128.39:9300}{rack=10.39.128.39, max_local_storage_nodes=1, master=true},{bjx-efe-es-08}{54boIymMQgyzJPo69TTIcw}{10.39.128.41}{10.39.128.41:9300}{rack=10.39.128.41, max_local_storage_nodes=1, master=true},}, added {{efe-es-01}{1bVuHy8NQv-P1yQPSt2AMg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true},}, reason: zen-disco-receive(from master [{efe-es-01}{1bVuHy8NQv-P1yQPSt2AMg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true}])
[2016-01-12 16:23:49,375][INFO ][cluster.service ] [bjx-efe-es-07] removed {{efe-es-02}{v_gaTkDySXWO4AI2jeL_nw}{10.39.129.27}{10.39.129.27:9300}{rack=10.39.129.27, max_local_storage_nodes=1, master=true},}, reason: zen-disco-receive(from master [{efe-es-01}{1bVuHy8NQv-P1yQPSt2AMg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true}])
[2016-01-12 16:23:49,394][INFO ][cluster.service ] [bjx-efe-es-07] removed {{bjx-efe-es-10}{Oa5F3NP-TTCC9El2KUPnrQ}{10.39.129.56}{10.39.129.56:9300}{rack=10.39.129.56, max_local_storage_nodes=1, master=true},}, reason: zen-disco-receive(from master [{efe-es-01}{1bVuHy8NQv-P1yQPSt2AMg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true}])
[2016-01-12 16:23:49,944][INFO ][discovery.zen ] [bjx-efe-es-07] master_left [{efe-es-01}{1bVuHy8NQv-P1yQPSt2AMg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true}], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2016-01-12 16:23:49,944][WARN ][discovery.zen ] [bjx-efe-es-07] master left (reason = failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: {{bjx-efe-es-09}{jwR1VIjIRymvxp2hSip8eA}{10.39.129.55}{10.39.129.55:9300}{rack=10.39.129.55, max_local_storage_nodes=1, master=true},{efe-es-03}{6fV5VUH3Qp22LR6pp09vag}{10.39.129.28}{10.39.129.28:9300}{rack=10.39.129.28, max_local_storage_nodes=1, master=true},{bjx-efe-es-07}{vCJnRZ1-Tw2g78I3IsWg5w}{10.39.128.40}{10.39.128.40:9300}{rack=10.39.128.40, max_local_storage_nodes=1, master=true},{efe-es-05}{TdAANxmnRjeUWJk--k5ATQ}{10.39.129.30}{10.39.129.30:9300}{rack=10.39.129.30, max_local_storage_nodes=1, master=true},{efe-es-04}{R8XrJ7jgRA6sU7q8LHBqmQ}{10.39.129.29}{10.39.129.29:9300}{rack=10.39.129.29, max_local_storage_nodes=1, master=true},}
[2016-01-12 16:23:49,945][INFO ][cluster.service ] [bjx-efe-es-07] removed {{efe-es-01}{1bVuHy8NQv-P1yQPSt2AMg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true},}, reason: zen-disco-master_failed ({efe-es-01}{1bVuHy8NQv-P1yQPSt2AMg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true})
[2016-01-12 16:23:49,946][INFO ][discovery.zen ] [bjx-efe-es-07] failed to send join request to master [{efe-es-01}{1bVuHy8NQv-P1yQPSt2AMg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true}], reason [NodeDisconnectedException[[efe-es-01][10.39.128.30:9300][internal:discovery/zen/join] disconnected]]
[2016-01-12 16:23:52,963][INFO ][cluster.service ] [bjx-efe-es-07] detected_master {efe-es-01}{1bVuHy8NQv-P1yQPSt2AMg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true}, added {{bjx-efe-es-08}{54boIymMQgyzJPo69TTIcw}{10.39.128.41}{10.39.128.41:9300}{rack=10.39.128.41, max_local_storage_nodes=1, master=true},{efe-es-01}{1bVuHy8NQv-P1yQPSt2AMg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true},{efe-es-02}{v_gaTkDySXWO4AI2jeL_nw}{10.39.129.27}{10.39.129.27:9300}{rack=10.39.129.27, max_local_storage_nodes=1, master=true},{bjx-efe-es-06}{6EhE00oTQ2u9jVi1HtcNFQ}{10.39.128.39}{10.39.128.39:9300}{rack=10.39.128.39, max_local_storage_nodes=1, master=true},}, reason: zen-disco-receive(from master [{efe-es-01}{1bVuHy8NQv-P1yQPSt2AMg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true}])
[2016-01-12 16:23:52,972][INFO ][cluster.service ] [bjx-efe-es-07] added {{bjx-efe-es-10}{Oa5F3NP-TTCC9El2KUPnrQ}{10.39.129.56}{10.39.129.56:9300}{rack=10.39.129.56, max_local_storage_nodes=1,
[2016-01-12 16:23:54,966][INFO ][discovery.zen ] [bjx-efe-es-07] master_left [{efe-es-01}{1bVuHy8NQv-P1yQPSt2AMg}{10.39.128.30}{10.39.128.30:9300}{rack=10.39.128.30, max_local_storage_nodes=1, master=true}],


(Wisre) #5

the all log like this,not discover master

[2016-01-12 17:38:47,175][INFO ][rest.action.readonlyrest ] [efe-es-03] Readonly REST plugin was loaded...
[2016-01-12 17:38:47,175][INFO ][rest.action.readonlyrest ] [efe-es-03] Readonly Rest plugin is installed, but not enabled
[2016-01-12 17:38:47,175][INFO ][rest.action.readonlyrest ] [efe-es-03] Readonly REST plugin is disabled!
[2016-01-12 17:38:47,255][INFO ][node ] [efe-es-03] initialized
[2016-01-12 17:38:47,255][INFO ][node ] [efe-es-03] starting ...
[2016-01-12 17:38:47,432][INFO ][transport ] [efe-es-03] publish_address {10.39.129.28:9300}, bound_addresses {10.39.129.28:9300}
[2016-01-12 17:38:47,440][INFO ][discovery ] [efe-es-03] elongELK/opB8sMAWTuOGld-qtUbiAQ
[2016-01-12 17:39:17,440][WARN ][discovery ] [efe-es-03] waited for 30s and no initial state was set by the discovery
[2016-01-12 17:39:17,502][INFO ][http ] [efe-es-03] publish_address {10.39.129.28:8080}, bound_addresses {10.39.129.28:8080}
[2016-01-12 17:39:17,503][INFO ][node ] [efe-es-03] started
[2016-01-12 17:39:22,763][INFO ][rest.suppressed ] /.kibana/config/_search Params: {index=.kibana, type=config}
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:154)
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:144)


(system) #6