Elasticsearch 7.1.1 not forming a cluster using EC2 auto discovery plugin

I have a 3 node cluster which I have setup in AWS using the AWS EC2 discovery plugin, each of the nodes can communicate with each other and autodiscovery finds the other nodes, but they dont form a cluster. I am using a basic licence and have not yet turned on SSL, not sure if this is now a requirement?
There are no errors in the logs, but each node just forms its own cluster and not one single cluster.

elasticsearch.yml
discovery.seed_providers: ec2
discovery.ec2.tag.discovery: my-cluster-discovery-tag
cloud.node.auto_attributes: true
cluster.routing.allocation.awareness.attributes: aws_availability_zone
cluster.name: my-cluster-name
network.host: [ local, site]
logger.org.elasticsearch.discovery: TRACE

Logs
[2019-06-04T13:25:17,830][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] not active
[2019-06-04T13:25:17,883][INFO ][o.e.c.s.MasterService ] [ip-10-241-0-24.ec2.internal] elected-as-master ([1] nodes joined)[{ip-10-241-0-24.ec2.internal}{xI8fYZUmQ8WAZfIMjgp9sQ}{4QwPqtsFTr6H8WyImS25gg}{10.241.0.24}{10.241.0.24:9300}{aws_availability_zone=us-east-1a, ml.machine_memory=8362668032, xpack.installed=true, ml.max_open_jobs=20} elect leader, BECOME_MASTER_TASK, FINISH_ELECTION], term: 18, version: 64, reason: master node changed {previous , current [{ip-10-241-0-24.ec2.internal}{xI8fYZUmQ8WAZfIMjgp9sQ}{4QwPqtsFTr6H8WyImS25gg}{10.241.0.24}{10.241.0.24:9300}{aws_availability_zone=us-east-1a, ml.machine_memory=8362668032, xpack.installed=true, ml.max_open_jobs=20}]}
[2019-06-04T13:25:18,012][INFO ][o.e.c.s.ClusterApplierService] [ip-10-241-0-24.ec2.internal] master node changed {previous , current [{ip-10-241-0-24.ec2.internal}{xI8fYZUmQ8WAZfIMjgp9sQ}{4QwPqtsFTr6H8WyImS25gg}{10.241.0.24}{10.241.0.24:9300}{aws_availability_zone=us-east-1a, ml.machine_memory=8362668032, xpack.installed=true, ml.max_open_jobs=20}]}, term: 18, version: 64, reason: Publication{term=18, version=64}
[2019-06-04T13:25:18,200][INFO ][o.e.h.AbstractHttpServerTransport] [ip-10-241-0-24.ec2.internal] publish_address {10.241.0.24:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}, {10.241.0.24:9200}
[2019-06-04T13:25:18,201][INFO ][o.e.n.Node ] [ip-10-241-0-24.ec2.internal] started
[2019-06-04T13:25:18,546][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [ip-10-241-0-24.ec2.internal] Failed to clear cache for realms []
[2019-06-04T13:25:18,637][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] not active
[2019-06-04T13:25:18,762][INFO ][o.e.l.LicenseService ] [ip-10-241-0-24.ec2.internal] license [4cbbefce-620e-48e7-aada-64b0f04d8f51] mode [basic] - valid
[2019-06-04T13:25:18,786][INFO ][o.e.g.GatewayService ] [ip-10-241-0-24.ec2.internal] recovered [0] indices into cluster_state
[2019-06-04T13:25:19,183][TRACE][o.e.d.e.AwsEc2SeedHostsProvider] [ip-10-241-0-24.ec2.internal] finding seed nodes...
[2019-06-04T13:25:19,184][TRACE][o.e.d.e.AwsEc2SeedHostsProvider] [ip-10-241-0-24.ec2.internal] adding i-003c7e2a6e0e097a5, address 10.241.0.105, transport_address 10.241.0.105:9300
[2019-06-04T13:25:19,184][TRACE][o.e.d.e.AwsEc2SeedHostsProvider] [ip-10-241-0-24.ec2.internal] adding i-0d7c3be1d22faa9b7, address 10.241.0.144, transport_address 10.241.0.144:9300
[2019-06-04T13:25:19,184][TRACE][o.e.d.e.AwsEc2SeedHostsProvider] [ip-10-241-0-24.ec2.internal] adding i-0c99fd54453835c02, address 10.241.0.24, transport_address 10.241.0.24:9300
[2019-06-04T13:25:19,185][DEBUG][o.e.d.e.AwsEc2SeedHostsProvider] [ip-10-241-0-24.ec2.internal] using dynamic transport addresses [10.241.0.105:9300, 10.241.0.144:9300, 10.241.0.24:9300]
[2019-06-04T13:25:19,185][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] probing resolved transport addresses [127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, [::1]:9301, [::1]:9302, [::1]:9303, [::1]:9304, 10.241.0.105:9300, 10.241.0.144:9300, 10.241.0.24:9300]
[2019-06-04T13:25:19,186][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] startProbe(127.0.0.1:9301) not running
[2019-06-04T13:25:19,186][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] startProbe(127.0.0.1:9302) not running
[2019-06-04T13:25:19,186][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] startProbe(127.0.0.1:9303) not running
[2019-06-04T13:25:19,186][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] startProbe(127.0.0.1:9304) not running
[2019-06-04T13:25:19,186][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] startProbe([::1]:9301) not running
[2019-06-04T13:25:19,186][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] startProbe([::1]:9302) not running
[2019-06-04T13:25:19,187][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] startProbe([::1]:9303) not running
[2019-06-04T13:25:19,187][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] startProbe([::1]:9304) not running
[2019-06-04T13:25:19,187][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] startProbe(10.241.0.105:9300) not running
[2019-06-04T13:25:19,187][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] startProbe(10.241.0.144:9300) not running
[2019-06-04T13:25:19,187][TRACE][o.e.d.PeerFinder ] [ip-10-241-0-24.ec2.internal] startProbe(10.241.0.24:9300) not running

Hi @Pat_Humphreys,

I suspect that the first time you started these nodes they did not have discovery.seed_providers: ec2 and therefore they auto-bootstrapped into separate clusters. The note at the bottom of that manual page goes into more detail about what this means and what you should do to fix it.

1 Like

Thanks @DavidTurner that was what was preventing me from getting the errors.
But now I am getting a master_not_discovered_exception. Which is a bit odd as I can see the coms to all 3 hosts
Also do I need to set discovery.seed_hosts and cluster.initial_master_nodes? As I have discovery.seed_providers: ec2

Startup logs
https://pastebin.com/CYSnxebs

The logs you quoted only last a short time and look normal enough. Can you share more? I'm particularly looking for a message containing the string ClusterFormationFailureHelper.

You do not need discovery.seed_hosts since this is taken care of by discovery.seed_providers: ec2. You do, however, need cluster.initial_master_nodes to get the cluster going in the first place.

I have v fixed it thanks a lot for your help.
So my problem was not specifying the value cluster.initial_master_nodes

Its not great now having to hardcode this value as I have many clusters (with all nodes master eligible ) which I spin up dynamically on AWS with an AMI with all the config and a userdata script for cluster specific settings. So im going to end up writing a script to query the AWS describe API to get the ips for that cluster, but the EC2 discovery plugin already has this available to it.
Seems like a missing feature of EC2 autodiscovery

Unfortunately the DescribeInstances API doesn't give strong enough consistency guarantees for setting cluster.initial_master_nodes. There's a risk you will get inconsistent results on the different nodes, and this could result in forming multiple clusters without realising. It's quite a shame, because if it was consistent enough then we'd certainly be using it.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.