Cluster intermittently goes down

Hi ,

I have a cluster of 4 nodes. I see the nodes are unavailable intermittently.
I see the following error in the logs:
[2018-08-20T00:06:06,384][WARN ][o.e.d.z.ZenDiscovery ] [abc-node-1] master left (reason = transport disconnected), current nodes: nodes:
{abc-node-4}{XBPQiaQjRC6T4UfWp2KfBQ}{1k3msvKuQJKsEamzQ6WArA}{x.x.x.x}{x.x.x.x:9300}{ml.machine_memory=16658407424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}
{anc-node-1}{1f7Z_yOdSzq_9oA3x2gbAw}{OzuEwx34TES9cyUjm31iwQ}{x.x.x.x}{x.x.x.x:9300}{ml.machine_memory=16658407424, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, local
{abc-node-3}{Aj0XnCxZRqOc6UT-FlBmFQ}{i9lZu2tPSUaOMhUhmoKWJQ}{x.x.x.x}{x.x.x.x:9300}{ml.machine_memory=16658407424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}, master

[2018-08-20T00:06:06,397][INFO ][o.e.x.w.WatcherService ] [abc-node-1] stopping watch service, reason [no master node]
[2018-08-20T00:06:10,141][WARN ][o.e.d.z.UnicastZenPing ] [abc-node-1] failed to send ping to [{abc-node-4}{XBPQiaQjRC6T4UfWp2KfBQ}{1k3msvKuQJKsEamzQ6WArA}{x.x.x.x}{x.x.x.x:9300}{ml.machine_memory=16658407424, ml.max_open_jobs=20, xpack.installed=true, ml.enabled=true}]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [abc-node-4][x.x.x.x:9300][internal:discovery/zen/unicast] request_id [78172] timed out after [3750ms]
at org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:979) [elasticsearch-6.3.0.jar:6.3.0]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:625) [elasticsearch-6.3.0.jar:6.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

=========
elasticsearch.yml configuration:

cluster.name: dod-cluster
node.name: abc-node-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.bind_host: 0.0.0.0
network.publish_host: x.x.x.x
http.port: 9200
discovery.zen.ping.unicast.hosts: ["x.x.x.x", "x.x.x.x", "x.x.x.x", "x.x.x.x" ]
transport.bind_host: 0.0.0.0
transport.publish_host: x.x.x.x
discovery.zen.fd.ping_interval: 30s
discovery.zen.fd.ping_retries: 3

Can some one please help me?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.