Hi,
I am using Elasticsearch-5.0.1 with EC2 discovery. I am running a two nodes cluster both the node are perfectly joining he cluster and I am able to receive data for sometime from logstash. But after sometime cluster becomes unresponsive and master becomes unavailable and never recovers even if I stop sending data.
Below is my yml config for both the nodes:
cluster.name: test
network:
host: _ec2:privateIpv4_
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping_timeout: 60s
discovery.zen.join_timeout: 60s
indices.fielddata.cache.size: 25%
#Default
script.painless.regex.enabled: true
thread_pool.bulk.queue_size: 500
discovery:
type: ec2
discovery.ec2.host_type: private_ip
cloud:
aws:
access_key: <access key>
secret_key: <secret>
region: ap-south-1
Below is the stacktrace that I am getting:
[2018-02-21T06:04:22,691][WARN ][o.e.d.z.ZenDiscovery ] [0U577Jt] not enough master
nodes, current nodes: {{0U577Jt}{0U577JtbTDuGl4d3rw6Blg}{o8k1rmg_SIm0bi9mImlJ-Q}
{172.31.30.60}{172.31.30.60:9300},}
[2018-02-21T06:04:22,692][INFO ][o.e.c.s.ClusterService ] [0U577Jt] removed {{en5NzDU}
{en5NzDUIR-KCAnfGNro4vw}{bGyxwN3nT463ixk1w6KD2A}{172.31.16.6}{172.31.16.6:9300},},
reason: zen-disco-node-failed({en5NzDU}{en5NzDUIR-KCAnfGNro4vw}
{bGyxwN3nT463ixk1w6KD2A}{172.31.16.6}{172.31.16.6:9300}), reason(failed to ping, tried [3]
times, each with maximum [30s] timeout)[{en5NzDU}{en5NzDUIR-KCAnfGNro4vw}
{bGyxwN3nT463ixk1w6KD2A}{172.31.16.6}{172.31.16.6:9300} failed to ping, tried [3] times,
each with maximum [30s] timeout]
[2018-02-21T06:04:57,935][WARN ][r.suppressed ] path: /_bulk, params: {}
org.elasticsearch.cluster.block.ClusterBlockException: blocked by:
[SERVICE_UNAVAILABLE/2/no master];
at
org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:161) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:147) ~[elasticsearch-5.0.1.jar:5.0.1]
Also ran with three node cluster. Other nodes are responding well but one of the node is unable to join the cluster. Log from the same node:
[2018-02-21T07:25:50,591][INFO ][o.e.c.s.ClusterService ] [en5NzDU] detected_master
{0U577Jt}{0U577JtbTDuGl4d3rw6Blg}{Tp-1hFUjRp6z964Z63DTew}{172.31.30.60}
{172.31.30.60:9300}, added {{0U577Jt}{0U577JtbTDuGl4d3rw6Blg}{Tp-
1hFUjRp6z964Z63DTew}{172.31.30.60}{172.31.30.60:9300},}, reason: zen-disco-receive(from
master [master {0U577Jt}{0U577JtbTDuGl4d3rw6Blg}{Tp-1hFUjRp6z964Z63DTew}
{172.31.30.60}{172.31.30.60:9300} committed version [1]])
[2018-02-21T07:25:59,804][INFO ][o.e.c.s.ClusterService ] [en5NzDU] added {{eNOS2YK}
{eNOS2YKpT7-fc7sFMUpadw}{IIVZfJ8_QCGQvRcHnmIsBA}{172.31.30.241}
{172.31.30.241:9300},}, reason: zen-disco-receive(from master [master {0U577Jt}
{0U577JtbTDuGl4d3rw6Blg}{Tp-1hFUjRp6z964Z63DTew}{172.31.30.60}{172.31.30.60:9300}
committed version [26]])
[2018-02-21T07:30:41,264][INFO ][o.e.c.s.ClusterService ] [en5NzDU] removed {{eNOS2YK}
{eNOS2YKpT7-fc7sFMUpadw}{IIVZfJ8_QCGQvRcHnmIsBA}{172.31.30.241}
{172.31.30.241:9300},}, reason: zen-disco-receive(from master [master {0U577Jt}
{0U577JtbTDuGl4d3rw6Blg}{Tp-1hFUjRp6z964Z63DTew}{172.31.30.60}{172.31.30.60:9300}
committed version [36]])
[2018-02-21T07:38:41,886][INFO ][o.e.c.s.ClusterService ] [en5NzDU] added {{eNOS2YK}
{eNOS2YKpT7-fc7sFMUpadw}{AWmU3TIkRQ-n6t7vlJMUxw}{172.31.30.241}
{172.31.30.241:9300},}, reason: zen-disco-receive(from master [master {0U577Jt}
{0U577JtbTDuGl4d3rw6Blg}{Tp-1hFUjRp6z964Z63DTew}{172.31.30.60}{172.31.30.60:9300}
committed version [48]])
[2018-02-21T07:48:10,129][INFO ][o.e.d.z.ZenDiscovery ] [en5NzDU] master_left [{0U577Jt}
{0U577JtbTDuGl4d3rw6Blg}{Tp-1hFUjRp6z964Z63DTew}{172.31.30.60}{172.31.30.60:9300}],
reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2018-02-21T07:48:10,130][WARN ][o.e.d.z.ZenDiscovery ] [en5NzDU] master left (reason =
failed to ping, tried [3] times, each with maximum [30s] timeout), current nodes: {{en5NzDU}
{en5NzDUIR-KCAnfGNro4vw}{wz-MMrf2SDuJeGGSnjPzbQ}{172.31.16.6}{172.31.16.6:9300},
{eNOS2YK}{eNOS2YKpT7-fc7sFMUpadw}{AWmU3TIkRQ-n6t7vlJMUxw}{172.31.30.241}
{172.31.30.241:9300},}
[2018-02-21T07:48:10,130][INFO ][o.e.c.s.ClusterService ] [en5NzDU] removed {{0U577Jt}
{0U577JtbTDuGl4d3rw6Blg}{Tp-1hFUjRp6z964Z63DTew}{172.31.30.60}{172.31.30.60:9300},},
reason: master_failed ({0U577Jt}{0U577JtbTDuGl4d3rw6Blg}{Tp-1hFUjRp6z964Z63DTew}
{172.31.30.60}{172.31.30.60:9300})
[2018-02-21T07:49:25,360][WARN ][o.e.d.z.p.u.UnicastZenPing] [en5NzDU] failed to send ping
to [{#cloud-i-0d0e8454f516edd58-0}{Cl9dtsPdQqSbUJItjZMQPQ}{172.31.30.241}
{172.31.30.241:9300}]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [eNOS2YK]
[172.31.30.241:9300][internal:discovery/zen/unicast] request_id [1908] timed out after
[75001ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:840)
[elasticsearch-5.0.1.jar:5.0.1]
at
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:451) [elasticsearch-5.0.1.jar:5.0.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2018-02-21T07:49:55,497][WARN ][o.e.d.z.p.u.UnicastZenPing] [en5NzDU] failed to send ping
to [{#cloud-i-0d0e8454f516edd58-0}{S1gpMFOuTIafxHdxXIFMkQ}{172.31.30.241}
{172.31.30.241:9300}]
org.elasticsearch.transport.ReceiveTimeoutTransportException: [eNOS2YK]
[172.31.30.241:9300][internal:discovery/zen/unicast] request_id [1912] timed out after
[75000ms]
at
org.elasticsearch.transport.TransportService$TimeoutHandler.run(TransportService.java:840)
[elasticsearch-5.0.1.jar:5.0.1]
at
org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:451) [elasticsearch-5.0.1.jar:5.0.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_151]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
[2018-02-21T07:50:25,619][WARN ][o.e.d.z.p.u.UnicastZenPing] [en5NzDU] failed to send ping
to [{#cloud-i-0d0e8454f516edd58-0}{DKRtX3ZITY2onGn01BLVlQ}{172.31.30.241}
{172.31.30.241:9300}]