Unable to elect master node in ecs elastic cluster

We have 4 node elasticsearch ecs cluster. now problem is two nodes keep restarting elastic search containers and other two stable containers.

but unable to get masternode.

bash-4.1# cat /usr/share/elasticsearch/config/elasticsearch.yml
script.inline: true
cloud.aws.access_key: AKIAJXFMPDJ4GTC63CXQ
cloud.aws.secret_key: 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
cloud.aws.region: us-east-1
repositories.s3.bucket: "mpt-elk-snapshots"

node.master: true
node.data: true

xpack.security.enabled: false

cluster.name: cluster-prod
network.publish_host: ec2:privateIp
discovery.type: ec2
discovery.ec2.any_group: false
discovery.ec2.groups: sg-*****
discovery.zen.hosts_provider: ec2
discovery.zen.minimum_master_nodes: 3
#discovery.zen.ping.multicast.enabled: false

working two nodes getting this output on health.

curl -XGET 'localhost:9200/_cluster/health?pretty'

"error" : {
"root_cause" : [
"type" : "master_not_discovered_exception",
"reason" : null
"type" : "master_not_discovered_exception",
"reason" : null
"status" : 503

please help me to resolve this issue

curl -XGET 'http://localhost:9200'

"name" : "casXLJFLD",
"cluster_name" : "prod-cluster",
"cluster_uuid" : "QbooN9HaTnSRJClksjdf4Zy1aw",
"version" : {
"number" : "5.5.1",
"build_hash" : "23c4swdo",
"build_date" : "2017-07-18T20:44:24.823Z",
"build_snapshot" : false,
"lucene_version" : "6.6.0"
"tagline" : "You Know, for Search"

Elasticsearch service with in the docker container and it is taking more time to start service. As part of ECS setup ELB/ALB will look for health check of container and if container is not health with in the time frame it will automatically stops the container and try to start new container.

Before we have 30 sec internal for 2 times. If container is not health with 60 secs (1 mins) it stops running container and start a new one.

So health checks are failing all the time due to elastic search service is taking around 2 mins to start. and containers constantly rebuilding all the time.

we increased health check time in ELB to check 5 times (2 1/2 mins) before stopping container. Issue resolved.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.