Our 231 node, cloud ES cluster is stuck in a "RED" state.
Cluster health:
{
"cluster_name" : "exabeam-es",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 231,
"number_of_data_nodes" : 230,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : "NaN"
}
Here is our configuration:
discovery.zen.ping.unicast.hosts: [/* 185 hosts in the list */]
discovery.zen.minimum_master_nodes: "93"
discovery.zen.fd.ping_interval: 5s
discovery.zen.fd.ping_timeout: "60s"
transport.tcp.connect_timeout: "60s"
We are getting this exception:
[2018-11-29T21:30:31,603][WARN ][r.suppressed ] path: /index-migrations, params: {index=index-migrations}
org.elasticsearch.transport.RemoteTransportException: [host46-2][10.50.61.136:9300][indices:admin/create]
Caused by: org.elasticsearch.discovery.MasterNotDiscoveredException: ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:209) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:311) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:238) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.cluster.service.ClusterService$NotifyTimeout.run(ClusterService.java:1057) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) ~[elasticsearch-5.4.0.jar:5.4.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_91]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]
Caused by: org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
at org.elasticsearch.cluster.block.ClusterBlocks.indexBlockedException(ClusterBlocks.java:182) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.admin.indices.create.TransportCreateIndexAction.checkBlock(TransportCreateIndexAction.java:64) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.admin.indices.create.TransportCreateIndexAction.checkBlock(TransportCreateIndexAction.java:39) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.doStart(TransportMasterNodeAction.java:134) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.start(TransportMasterNodeAction.java:126) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:104) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.support.master.TransportMasterNodeAction.doExecute(TransportMasterNodeAction.java:54) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:170) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:142) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:64) ~[elasticsearch-5.4.0.jar:5.4.0]
at org.elasticsearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:54) ~[elasticsearch-5.4.0.jar:5.4.0]
at com.floragunn.searchguard.ssl.transport.SearchGuardSSLRequestHandler.messageReceivedDecorate(SearchGuardSSLRequestHandler.java:177) ~[?:?]
at com.floragunn.searchguard.ssl.transport.SearchGuardSSLRequestHandler.messageReceived(SearchGuardSSLRequestHandler.java:139) ~[?:?]
Here is an excerpt from 'https://localhost:9200/_cluster/state'
"cluster_name" : "exabeam-es",
"version" : 17,
"state_uuid" : "XJim7LIaRxiTWw-9vE1ItQ",
"master_node" : "-YfA-UIpQ4yVXcFjW0F7YQ",
"blocks" : {
"global" : {
"1" : {
"description" : "state not recovered / initialized",
"retryable" : true,
"disable_state_persistence" : true,
"levels" : [
"read",
"write",
"metadata_read",
"metadata_write"
]
}
}
},