Elasticsearch cluster fail to join on aws


(waterdudu) #1

I'm using EC2 discovery to setup es cluster. My cluster has two nodes, t2.small instance, but the cluster failed to join from time to time.

Here is the log from one node:

[2016-10-28 10:09:51,535][INFO ][discovery.ec2            ] [erk_elasticsearch-0] master_left [{erk_elasticsearch-1}{ax9g1gDjRwm_YalM5XfkDQ}{10.0.1.230}{10.0.1.230:9300}], reason [failed to ping, tried [3] times, each with  maximum [30s] timeout]
[2016-10-28 10:09:51,536][WARN ][discovery.ec2            ] [erk_elasticsearch-0] master left (reason = failed to ping, tried [3] times, each with  maximum [30s] timeout), current nodes: {{erk_elasticsearch-0}{oD_KLl8uRpS6vgN2bYo9bg}{10.0.1.232}{10.0.1.232:9300},}
[2016-10-28 10:09:51,536][INFO ][cluster.service          ] [erk_elasticsearch-0] removed {{erk_elasticsearch-1}{ax9g1gDjRwm_YalM5XfkDQ}{10.0.1.230}{10.0.1.230:9300},}, reason: zen-disco-master_failed ({erk_elasticsearch-1}{ax9g1gDjRwm_YalM5XfkDQ}{10.0.1.230}{10.0.1.230:9300})
[2016-10-28 10:09:51,542][DEBUG][action.admin.cluster.health] [erk_elasticsearch-0] connection exception while trying to forward request with action name [cluster:monitor/health] to master node [{erk_elasticsearch-1}{ax9g1gDjRwm_YalM5XfkDQ}{10.0.1.230}{10.0.1.230:9300}], scheduling a retry. Error: [org.elasticsearch.transport.NodeDisconnectedException: [erk_elasticsearch-1][10.0.1.230:9300][cluster:monitor/health] disconnected]
[2016-10-28 10:09:51,543][DEBUG][action.admin.cluster.health] [erk_elasticsearch-0] connection exception while trying to forward request with action name [cluster:monitor/health] to master node [{erk_elasticsearch-1}{ax9g1gDjRwm_YalM5XfkDQ}{10.0.1.230}{10.0.1.230:9300}], scheduling a retry. Error: [org.elasticsearch.transport.NodeDisconnectedException: [erk_elasticsearch-1][10.0.1.230:9300][cluster:monitor/health] disconnected]
[2016-10-28 10:09:51,543][DEBUG][action.admin.cluster.health] [erk_elasticsearch-0] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
NodeDisconnectedException[[erk_elasticsearch-1][10.0.1.230:9300][cluster:monitor/health] disconnected]
[2016-10-28 10:09:51,544][WARN ][rest.suppressed          ] /_cat/health Params: {}
MasterNotDiscoveredException[NodeDisconnectedException[[erk_elasticsearch-1][10.0.1.230:9300][cluster:monitor/health] disconnected]]; nested: NodeDisconnectedException[[erk_elasticsearch-1][10.0.1.230:9300][cluster:monitor/health] disconnected];
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:226)
    at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:126)
    at org.elasticsearch.cluster.ClusterStateObserver.waitForNextChange(ClusterStateObserver.java:98)
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.retry(TransportMasterNodeAction.java:211)
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction.access$900(TransportMasterNodeAction.java:110)
    at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$4.handleException(TransportMasterNodeAction.java:200)
    at org.elasticsearch.transport.TransportService$Adapter$3.run(TransportService.java:622)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: NodeDisconnectedException[[erk_elasticsearch-1][10.0.1.230:9300][cluster:monitor/health] disconnected]

Here is the elasticsearch.yml file:

discovery.zen.ping.multicast.enabled: false
discovery.type: ec2
discovery.ec2.groups: sg-53edb636
discovery.ec2.availability_zones: cn-north-1a
discovery.ec2.tag.elasticsearch: erk_elasticsearch
cloud.aws.protocol: http
cloud.aws.region: cn-north-1
cloud.aws.access_key: <key>
cloud.aws.secret_key: <secret>

Any ideas why this happens?

Thanks


(system) #2