ES 7.6 on AWS ECS

Hello,

Trust all is well.

Im trying to setup a cluster on AWS ECS via Cloudformation. I've used ELK in the past and have been used to the older style dynamic node discovery for all nodes (data & masters) and in setting up things on v7 I believe Im doing something not quite right and I would really appreciate if someone could shed some light on things. I've been following this link as a rough guide: Discovery with ECS, Dynamic Initial Master Nodes

My setup consists of Docker containers (v7.6.1) and Im using the EC2 discovery plugin. Config file shown below:

cluster:
  name: "${ES_CLUSTER_NAME:elasticsearch}"
  routing:
    allocation:
      awareness.attributes: aws_availability_zone
node:
  master: "${ES_MASTER}"
  data: "${ES_DATA}"
path:
  data: /var/data/elasticsearch
network:
  host: 0.0.0.0
  bind_host: 0.0.0.0
  publish_host: _ec2:privateIp_
transport:
  publish_host: _ec2:privateIp_
discovery:
  seed_providers: ec2
  zen:
    minimum_master_nodes: 1
    ping_timeout: 10s
  ec2:
    tag:
      elasticsearch: "${DISCOVERY_TAG}"
    host_type: private_ip
cloud.node.auto_attributes: true
xpack.security.enabled: false
xpack.graph.enabled: false
xpack.ml.enabled: false
xpack.watcher.enabled: false

When I start node1 up I provide:

  node.name: "dev-master"
  cluster.initial_master_nodes: "dev-master"
  ES_MASTER: "true"
  ES_DATA: "true"
  DISCOVERY_TAG: "dev"
  ...

and this starts the cluster up fine, its after this that things dont quite work as intended...

I start the second node with these env vars:

  cluster.initial_master_nodes: "dev-master"
  ES_MASTER: "true"
  ES_DATA: "true"
  DISCOVERY_TAG: "dev"
  ...

Im not specifying the node.name here because I belive the cluster has formed and am happy for it to pick an id. Problem is this second one appears to see the first one but still throws

"message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and this node must discover master-eligible nodes [dev-master] to bootstrap a cluster: have discovered....

and so it never joins the cluster. Could someone please tell me how I go about doing this? Ideally what Im after is to have one node elect itself to create the cluster and then have the remaining n nodes just join this existing cluster - this is to avoid the overhead of having to maintain many images and orchestration steps because the chances of error grow

Im not sure how I add a new master node into an existing cluster say in the event one of the existing ones fails. Also do I need to specify the cluster.initial_master_nodes for every node from now on (data & master).

Any input on this is very appreciated!

Cheers
Josh

Hi @jf11.

i think below is matching with your solution -

###remote##

Thanks
HadoopHelp

Hi @rameshkr1994

Thank you for looking into this, but Im not sure I follow. I'm running es via the official docker containers and I get the first one working and it says cluster started etc and I get the cluster ID. The problem for me is I can't get other nodes to join the cluster at all and I'm not sure what I'm doing wrong. It's a brand new cluster I'm attempting to spin up, no old data etc.

As a side note, I'm fairly certain it's not infrastructure related with security groups etc - the same setup works beautifully on v6. 8 without issues. Nodes join the cluster and it works like I've been using it for years. V7 is very different

How does one join nodes to a cluster on 7?

Have a great day

Cheers

Quoting these docs regarding cluster.initial_master_nodes:

You should not use this setting when restarting a cluster or adding a new node to an existing cluster.

You shouldn't use this setting after the initial cluster has started up completely and reported itself to be healthy. In your case you're starting with a one-node cluster.

There are also docs on adding a master-eligible node to an existing cluster. It's basically the same in 7.x as in 6.x except that you don't need to muck around with discovery.zen.minimum_master_nodes any more.

I suspect you have a discovery config problem, but you've elided most of the warning message that contains the details we'd need to help further. Can you share the whole message?

Hi @DavidTurner

Thank you for that. Yup, I think you;ve summed it up perfectly :slight_smile: I've got the single node master that I want to add more masters to.

So logs are as follows:
On the master node:


logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "deprecation", "timestamp": "2020-03-13T21:56:34,973Z", "level": "WARN", "component": "o.e.d.c.s.Settings", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "[discovery.zen.minimum_master_nodes] setting was deprecated in Elasticsearch and will be removed in a future release! See the breaking changes documentation for the next major version." }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:36,975Z", "level": "INFO", "component": "o.e.d.DiscoveryModule", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "using discovery type [zen] and seed hosts providers [settings, ec2]" }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:37,933Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "initialized" }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:37,934Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "starting ..." }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:38,288Z", "level": "INFO", "component": "o.e.t.TransportService", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "publish_address {10.0.4.88:9300}, bound_addresses {0.0.0.0:9300}" }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:38,869Z", "level": "INFO", "component": "o.e.b.BootstrapChecks", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "bound or publishing to a non-loopback address, enforcing bootstrap checks" }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:38,891Z", "level": "INFO", "component": "o.e.c.c.Coordinator", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "setting initial configuration to VotingConfiguration{qa4xOUZ9Smick6RaXUUa4A}" }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:39,286Z", "level": "INFO", "component": "o.e.c.s.MasterService", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "elected-as-master ([1] nodes joined)[{logs-dev-master}{qa4xOUZ9Smick6RaXUUa4A}{x-7VmxM5RBO8XTI33ob1MA}{10.0.4.88}{10.0.4.88:9300}{dim}{aws_availability_zone=ap-southeast-2a, xpack.installed=true} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 1, version: 1, delta: master node changed {previous [], current [{logs-dev-master}{qa4xOUZ9Smick6RaXUUa4A}{x-7VmxM5RBO8XTI33ob1MA}{10.0.4.88}{10.0.4.88:9300}{dim}{aws_availability_zone=ap-southeast-2a, xpack.installed=true}]}" }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:39,361Z", "level": "INFO", "component": "o.e.c.c.CoordinationState", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "cluster UUID set to [h6ZYpdfUSvKE-CkH_S7PKw]" }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:39,419Z", "level": "INFO", "component": "o.e.c.s.ClusterApplierService", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "master node changed {previous [], current [{logs-dev-master}{qa4xOUZ9Smick6RaXUUa4A}{x-7VmxM5RBO8XTI33ob1MA}{10.0.4.88}{10.0.4.88:9300}{dim}{aws_availability_zone=ap-southeast-2a, xpack.installed=true}]}, term: 1, version: 1, reason: Publication{term=1, version=1}" }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:39,504Z", "level": "INFO", "component": "o.e.h.AbstractHttpServerTransport", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "publish_address {10.0.4.88:9200}, bound_addresses {0.0.0.0:9200}", "cluster.uuid": "h6ZYpdfUSvKE-CkH_S7PKw", "node.id": "qa4xOUZ9Smick6RaXUUa4A"  }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:39,506Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "started", "cluster.uuid": "h6ZYpdfUSvKE-CkH_S7PKw", "node.id": "qa4xOUZ9Smick6RaXUUa4A"  }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:39,667Z", "level": "INFO", "component": "o.e.g.GatewayService", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "recovered [0] indices into cluster_state", "cluster.uuid": "h6ZYpdfUSvKE-CkH_S7PKw", "node.id": "qa4xOUZ9Smick6RaXUUa4A"  }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:40,187Z", "level": "INFO", "component": "o.e.c.m.MetaDataIndexTemplateService", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "adding template [ilm-history] for index patterns [ilm-history-1*]", "cluster.uuid": "h6ZYpdfUSvKE-CkH_S7PKw", "node.id": "qa4xOUZ9Smick6RaXUUa4A"  }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:40,303Z", "level": "INFO", "component": "o.e.c.m.MetaDataIndexTemplateService", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "adding template [.slm-history] for index patterns [.slm-history-1*]", "cluster.uuid": "h6ZYpdfUSvKE-CkH_S7PKw", "node.id": "qa4xOUZ9Smick6RaXUUa4A"  }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:40,374Z", "level": "INFO", "component": "o.e.c.m.MetaDataIndexTemplateService", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "adding template [.monitoring-logstash] for index patterns [.monitoring-logstash-7-*]", "cluster.uuid": "h6ZYpdfUSvKE-CkH_S7PKw", "node.id": "qa4xOUZ9Smick6RaXUUa4A"  }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:40,481Z", "level": "INFO", "component": "o.e.c.m.MetaDataIndexTemplateService", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "adding template [.monitoring-es] for index patterns [.monitoring-es-7-*]", "cluster.uuid": "h6ZYpdfUSvKE-CkH_S7PKw", "node.id": "qa4xOUZ9Smick6RaXUUa4A"  }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:40,576Z", "level": "INFO", "component": "o.e.c.m.MetaDataIndexTemplateService", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "adding template [.monitoring-beats] for index patterns [.monitoring-beats-7-*]", "cluster.uuid": "h6ZYpdfUSvKE-CkH_S7PKw", "node.id": "qa4xOUZ9Smick6RaXUUa4A"  }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:40,647Z", "level": "INFO", "component": "o.e.c.m.MetaDataIndexTemplateService", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "adding template [.monitoring-alerts-7] for index patterns [.monitoring-alerts-7]", "cluster.uuid": "h6ZYpdfUSvKE-CkH_S7PKw", "node.id": "qa4xOUZ9Smick6RaXUUa4A"  }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:40,717Z", "level": "INFO", "component": "o.e.c.m.MetaDataIndexTemplateService", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "adding template [.monitoring-kibana] for index patterns [.monitoring-kibana-7-*]", "cluster.uuid": "h6ZYpdfUSvKE-CkH_S7PKw", "node.id": "qa4xOUZ9Smick6RaXUUa4A"  }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:40,775Z", "level": "INFO", "component": "o.e.x.i.a.TransportPutLifecycleAction", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "adding index lifecycle policy [ilm-history-ilm-policy]", "cluster.uuid": "h6ZYpdfUSvKE-CkH_S7PKw", "node.id": "qa4xOUZ9Smick6RaXUUa4A"  }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:40,832Z", "level": "INFO", "component": "o.e.x.i.a.TransportPutLifecycleAction", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "adding index lifecycle policy [slm-history-ilm-policy]", "cluster.uuid": "h6ZYpdfUSvKE-CkH_S7PKw", "node.id": "qa4xOUZ9Smick6RaXUUa4A"  }
logs-dev-log-group elk-es/elk-es/f7390295-052d-4a2b-8de1-32985b8758ff {"type": "server", "timestamp": "2020-03-13T21:56:41,712Z", "level": "INFO", "component": "o.e.l.LicenseService", "cluster.name": "logs-dev", "node.name": "logs-dev-master", "message": "license [aa19b72c-15aa-4020-8996-272407032ee7] mode [basic] - valid", "cluster.uuid": "h6ZYpdfUSvKE-CkH_S7PKw", "node.id": "qa4xOUZ9Smick6RaXUUa4A"  }

And I fired up the second node leaving out the cluster.initial_master_nodes parameter this time:

logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:34:42,031Z", "level": "INFO", "component": "o.e.d.DiscoveryModule", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "using discovery type [zen] and seed hosts providers [settings, ec2]" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:34:43,349Z", "level": "INFO", "component": "o.e.t.TransportService", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "publish_address {10.0.5.39:9300}, bound_addresses {0.0.0.0:9300}" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:34:43,962Z", "level": "INFO", "component": "o.e.b.BootstrapChecks", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "bound or publishing to a non-loopback address, enforcing bootstrap checks" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:34:53,995Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.0.4.88:9300, 10.0.5.39:9300, 10.0.6.231:9300] from hosts providers and [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:35:04,003Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.0.4.88:9300, 10.0.5.39:9300, 10.0.6.231:9300] from hosts providers and [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:35:14,021Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.0.4.88:9300, 10.0.5.39:9300, 10.0.6.231:9300] from hosts providers and [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:35:14,018Z", "level": "WARN", "component": "o.e.n.Node", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "timed out while waiting for initial discovery state - timeout: 30s" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:35:14,047Z", "level": "INFO", "component": "o.e.h.AbstractHttpServerTransport", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "publish_address {10.0.5.39:9200}, bound_addresses {0.0.0.0:9200}" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:35:14,048Z", "level": "INFO", "component": "o.e.n.Node", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "started" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:35:24,025Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.0.4.88:9300, 10.0.5.39:9300, 10.0.6.231:9300] from hosts providers and [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:35:34,027Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.0.4.88:9300, 10.0.5.39:9300, 10.0.6.231:9300] from hosts providers and [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:35:44,030Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.0.4.88:9300, 10.0.5.39:9300, 10.0.6.231:9300] from hosts providers and [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:35:54,032Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.0.4.88:9300, 10.0.5.39:9300, 10.0.6.231:9300] from hosts providers and [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:36:04,034Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.0.4.88:9300, 10.0.5.39:9300, 10.0.6.231:9300] from hosts providers and [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:36:14,037Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.0.4.88:9300, 10.0.5.39:9300, 10.0.6.231:9300] from hosts providers and [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }
logs-dev-log-group elk-es/elk-es/182d0cd4-fb70-4566-85f0-5892416a9a6a {"type": "server", "timestamp": "2020-03-13T22:36:24,040Z", "level": "WARN", "component": "o.e.c.c.ClusterFormationFailureHelper", "cluster.name": "logs-dev", "node.name": "logs-dev-b", "message": "master not discovered yet, this node has not previously joined a bootstrapped (v7+) cluster, and [cluster.initial_master_nodes] is empty on this node: have discovered [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}]; discovery will continue using [127.0.0.1:9300, 127.0.0.1:9301, 127.0.0.1:9302, 127.0.0.1:9303, 127.0.0.1:9304, 127.0.0.1:9305, 10.0.4.88:9300, 10.0.5.39:9300, 10.0.6.231:9300] from hosts providers and [{logs-dev-b}{xH2tXROEQhe0uKYiQFSD-w}{gF0zxCEjSIC4dl1OaXR7KQ}{10.0.5.39}{10.0.5.39:9300}{dim}{aws_availability_zone=ap-southeast-2b, xpack.installed=true}] from last-known cluster state; node term 0, last-accepted version 0 in term 0" }

Im not sure why it still thinks there is no cluster though, or what is preventing the join. Really do appreciate you looking at this :slight_smile:

Have a great day

Cheers
Josh

Thanks, that's helpful. This looks like a connectivity issue. Are you sure that the nodes can talk to each other on port 9300? If you issue curl -vv http://10.0.4.88:9300/ from your second node, does the output end with the message This is not an HTTP port? If not, what do you see? Similarly, if you issue curl -vv http://10.0.5.39:9300/ from the working master, do you see This is not an HTTP port?

(use https instead of http if you've set up TLS for node-to-node communications)

Thankyou so much David, that at seems to have fixed it! Turns out the sg had 9200-9400 open but the iptables on the hosts only had 9200 open.

With regards to my understanding of how this works, does this sound right?

  • For a single (initial) master, I specify the cluster.initial_master_nodes & node.name.

  • I wait for the node to come up and say 'cluster created ... id= something'

  • From then on any new nodes (data / master) join the cluster as normal and I dont need to specify the cluster.initial_master_nodes, providing of course 9200 & 9300 are open

  • If the initial master were to fail for some reason and go offline, the cluster would elect a new one from the available pool (given I have to have enough of them to form quorum) ?

  • cycling of new nodes in and out (for security patches etc) remains as before , where I wait for the cluster to rebalance itself and then gracefully remove a node?

Much appreciate you taking the time to answer my questions and for clearing up the confusion

Cheers
Josh

Probably simplest to wait for green health.

Otherwise, yes, that's right.