I have a cluster with 2 co-ordinate nodes and 4 data/master nodes.
(1 coordinate node and 2 data/master node in each DC)
I have not started indexing any data, and even index is not created yet. But still the cluster goes down unexpectedly and I am seeing a lot of connect errors in logs.
network.host: 0.0.0.0
http.port: 9200
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping_timeout: 30s
bootstrap.memory_lock: true
node.master: false
node.ingest: false
node.data: false
cluster.routing.allocation.awareness.attributes: dc
transport.publish_host: 10.60.1XX.XX
node.name: ivylx3601
node.attr.dc: ttc
discovery.zen.ping.unicast.hosts:
-
10.60.1XX.XX ( coordinate)
-
10.60.2XX.XX
-
10.60.3XX.XX
-
10.61.1XX.XX( coordinate)
-
10.61.2XX.XX
-
10.61.3XX.XX
[2018-05-10T10:32:59,725][DEBUG][o.e.a.a.c.s.TransportClusterStateAction] [ivylx3601] timed out while retrying [cluster:monitor/state] after failure (timeout [30s])
[2018-05-10T10:32:59,725][WARN ][r.suppressed ] path: /_cluster/state/blocks, params: {metric=blocks}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:213) [elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) [elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:244) [elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:581) [elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) [elasticsearch-6.2.3.jar:6.2.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
[2018-05-10T10:32:59,757][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [ivylx3601] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2018-05-10T10:32:59,757][WARN ][r.suppressed ] path: /_cluster/health, params: {}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:213) [elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) [elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:244) [elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:581) [elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:573) [elasticsearch-6.2.3.jar:6.2.3]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]