Elastic cluster always down wihtout apparent cause

haiyuancheng · August 20, 2018, 1:49am

018-08-20T00:35:03,367][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [plat-ecloud01-db-es01] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2018-08-20T00:35:03,368][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [plat-ecloud01-db-es01] no known master node, scheduling a retry
[2018-08-20T00:35:03,368][WARN ][r.suppressed ] path: /_cluster/health, params: {}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:223) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:244) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:576) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:625) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
[2018-08-20T00:35:18,367][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [plat-ecloud01-db-es01] no known master node, scheduling a retry
[2018-08-20T00:35:18,367][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [plat-ecloud01-db-es01] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2018-08-20T00:35:18,367][WARN ][r.suppressed ] path: /_cluster/health, params: {}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.onTimeout(TransportMasterNodeAction.java:223) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:244) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:576) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:625) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
[2018-08-20T00:35:29,205][INFO ][o.e.d.z.ZenDiscovery ] [plat-ecloud01-db-es01] failed to send join request to master [{plat-ecloud01-db-es03}{Uj3hM1l6TIurc1x5eoM7XA}{ZD4pdJnLSkecmY
XgsjDcuw}{10.176.140.58}{10.176.140.58:9300}{ml.machine_memory=16658382848, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}], reason [RemoteTransportException[[plat-ecloud01
-db-es03][10.176.140.58:9300][internal:discovery/zen/join]]; nested: FailedToCommitClusterStateException[timed out while waiting for enough masters to ack sent cluster state. [1] left];
]
[2018-08-20T00:35:33,366][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [plat-ecloud01-db-es01] no known master node, scheduling a retry
[2018-08-20T00:35:33,368][DEBUG][o.e.a.a.c.h.TransportClusterHealthAction] [plat-ecloud01-db-es01] timed out

The es cluster run a perios of time, the service not available, is there any help for this ?
Below is my configurations?
node01:
cluster.name: elasticsearch-group_es_01
node.name: plat-ecloud01-db-es01
node.master: true
node.data: true
path.data: /var/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: True
bootstrap.system_call_filter: false
network.host: 0.0.0.0
http.port: 9200
http.enabled: true
discovery.zen.ping.unicast.hosts: ["10.176.140.60", "10.176.140.61", "10.176.140.58"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_nodes: 2
gateway.expected_nodes: 3
gateway.recover_after_time: 1m
discovery.zen.no_master_block: write
discovery.zen.fd.ping_timeout: 10s
http.cors.enabled: true
http.cors.allow-origin: "*"
http.max_content_length: 500mb
indices.recovery.max_bytes_per_sec: 200mb
indices.memory.index_buffer_size: 20%
xpack.security.enabled: false

node02:
cluster.name: elasticsearch-group_es_01
node.name: plat-ecloud01-db-es02
node.master: true
node.data: true
path.data: /var/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: True
bootstrap.system_call_filter: false
network.host: 0.0.0.0
http.port: 9200
http.enabled: true
discovery.zen.ping.unicast.hosts: ["10.176.140.60", "10.176.140.61", "10.176.140.58"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_nodes: 2
gateway.expected_nodes: 3
gateway.recover_after_time: 1m
discovery.zen.no_master_block: write
discovery.zen.fd.ping_timeout: 10s
http.cors.enabled: true
http.cors.allow-origin: "*"
http.max_content_length: 500mb
indices.recovery.max_bytes_per_sec: 200mb
indices.memory.index_buffer_size: 20%
xpack.security.enabled: false

node03:
cluster.name: elasticsearch-group_es_01
node.name: plat-ecloud01-db-es03
node.master: true
node.data: true
path.data: /var/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
bootstrap.system_call_filter: false
network.host: 0.0.0.0
http.port: 9200
http.enabled: true
discovery.zen.ping.unicast.hosts: ["10.176.140.60", "10.176.140.61", "10.176.140.58"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_nodes: 2
gateway.expected_nodes: 3
gateway.recover_after_time: 1m
discovery.zen.no_master_block: write
discovery.zen.fd.ping_timeout: 10s
http.cors.enabled: true
http.cors.allow-origin: "*"
http.max_content_length: 500mb
indices.recovery.max_bytes_per_sec: 200mb
indices.memory.index_buffer_size: 20%
xpack.security.enabled: false

system · September 17, 2018, 1:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.