Elasticsearch 5.1.1 master nodes can't discover each other in AWS

hello all - having a bit of trouble with getting my master ES nodes to discover each other and cluster up. ES 5.1.1 in AWS on Amazon AMIs (CentOS). I've validated that they are all in the same security group, that they can telnet to each other on port 9300, and that they have the discovery-ec2 plugin installed.

[2017-01-19T09:37:26,416][INFO ][o.e.d.z.ZenDiscovery     ] [ip-172-xx-xx-61] failed to send join request to master [{ip-172-xx-xx-137}{S6sO53VWSYmBZd35vi6o-Q}{15UkYeVaTHqY_JQb48oxgQ}{172.xx.xx.137}{172.xx.xx.137:9300}], reason [RemoteTransportException[[ip-172-xx-xx-137][172.xx.xx.137:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node
[{ip-172-xx-xx-137}{S6sO53VWSYmBZd35vi6o-Q}{15UkYeVaTHqY_JQb48oxgQ}{172.xx.xx.137}{172.xx.xx.137:9300}] not master for join request]; ], tried [3] times

Elasticsearch.yml is as follows:

[root@ip-172-xx-xx-61 elasticsearch-5.1.1]# cat /etc/elasticsearch/elasticsearch.yml 
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# THIS FILE IS MANAGED BY CHEF, DO NOT EDIT MANUALLY, YOUR CHANGES WILL BE OVERWRITTEN!
#
# Please see the documentation for further information on configuration options:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>
#
---
cluster.name: elasticsearch
node.name: ip-172-xx-xx-61
path.conf: "/etc/elasticsearch"
path.data: "/var/lib/elasticsearch"
path.logs: "/var/log/elasticsearch"
network.host: _ec2_
node.master: 'true'
node.data: 'false'
discovery.type: ec2
discovery.ec2.node_cache_time: 120s
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping_timeout: 30s
discovery:
  zen.hosts_provider: ec2
cloud.aws.region: us-west-2

Hi,
Try adding discovery.ec2.tag.Name: ec2-instance-tag-name. to both the nodes in the cluster.
All instances in the cluster should have the same 'ec2-instance-tag-name'
It worked for me.

@ash007 appreciate the reply - doesn't appear to have changed anything.

My current elasticsearch.yml:

[root@ip-172-xx-xx-137 elasticsearch-5.1.1]# cat /etc/elasticsearch/elasticsearch.yml 
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# THIS FILE IS MANAGED BY CHEF, DO NOT EDIT MANUALLY, YOUR CHANGES WILL BE OVERWRITTEN!
#
# Please see the documentation for further information on configuration options:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>
#
---
cluster.name: elasticsearch
node.name: ip-172-xx-xx-137
path.conf: "/etc/elasticsearch"
path.data: "/var/lib/elasticsearch"
path.logs: "/var/log/elasticsearch"
network.host: _ec2_
node.master: 'true'
node.data: 'false'
discovery.type: ec2
discovery.ec2.node_cache_time: 120s
discovery.ec2.groups: sg-5xxxxxxc
discovery.ec2.tag.Name: dev-elasticsearchApp
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping_timeout: 30s
discovery.zen.ping.unicast.hosts:
- 172.xx.xx.9
- 172.xx.xx.137
- 172.xx.xx.61
discovery:
  zen.hosts_provider: ec2
cloud.aws.region: us-west-2

and the current error:

[2017-01-19T13:23:32,101][INFO ][o.e.d.z.ZenDiscovery     ] [ip-172-xx-xx-9] failed to send join request to master [{ip-172-xx-xx-61}{S6sO53VWSYmBZd35vi6o-Q}{C1TtmQFIRxyIG6NK6_5jdw}{172.xx.xx.61}{172.xx.xx.61:9300}], reason [RemoteTransportException[[ip-172-xx-xx-61][172.xx.xx.61:9300][internal:discovery/zen/join]]; nested: NotMasterException[Node [{ip-172-xx-xx-61}{S6sO53VWSYmBZd35vi6o-Q}{C1TtmQFIRxyIG6NK6_5jdw}{172.xx.xx.61}{172.xx.xx.61:9300}] not master for join request]; ], tried [3] times

Make sure all the node instances are in the same subnet and security groups.
Can you try by removing 'discovery.zen.ping.unicast.hosts' and 'discovery.ec2.groups'.
And Add:
cloud.aws.protocol: http
cloud.aws.proxy.host: if you are behind any proxy
cloud.aws.proxy.port: proxy port
cloud.aws.region: region
discovery.ec2.availability_zones: us-west-1a,us-west-1b,us-west-1d,us-west-1e
network.host: eth0:ipv4, local
cloud.node.auto_attributes: true
cluster.routing.allocation.awareness.attributes: aws_availability_zone
plugin.mandatory: discovery-ec2, repository-s3
http.port: 9200
transport.tcp.port: 9300

let me know if it works.

Thanks @ash007 - it definitely did not like:

network.host: eth0:ipv4, local

Kept getting errors - reverted to:

network.host: _ec2_

Then it would at least start, though it never seems to discover any other nodes:

[root@ip-172-xx-xx-9 elasticsearch-5.1.1]# tail -f /var/log/elasticsearch/elasticsearch.log 
[2017-01-19T15:34:55,126][INFO ][o.e.p.PluginsService     ] [ip-172-xx-xx-9] loaded module [percolator]
[2017-01-19T15:34:55,126][INFO ][o.e.p.PluginsService     ] [ip-172-xx-xx-9] loaded module [reindex]
[2017-01-19T15:34:55,127][INFO ][o.e.p.PluginsService     ] [ip-172-xx-xx-9] loaded module [transport-netty3]
[2017-01-19T15:34:55,127][INFO ][o.e.p.PluginsService     ] [ip-172-xx-xx-9] loaded module [transport-netty4]
[2017-01-19T15:34:55,127][INFO ][o.e.p.PluginsService     ] [ip-172-xx-xx-9] loaded plugin [discovery-ec2]
[2017-01-19T15:34:55,127][INFO ][o.e.p.PluginsService     ] [ip-172-xx-xx-9] loaded plugin [repository-s3]
[2017-01-19T15:34:57,449][INFO ][o.e.n.Node               ] [ip-172-xx-xx-9] initialized
[2017-01-19T15:34:57,449][INFO ][o.e.n.Node               ] [ip-172-xx-xx-9] starting ...
[2017-01-19T15:34:57,567][INFO ][o.e.t.TransportService   ] [ip-172-xx-xx-9] publish_address {172.xx.xx.9:9300}, bound_addresses {172.xx.xx.9:9300}
[2017-01-19T15:34:57,572][INFO ][o.e.b.BootstrapCheck     ] [ip-172-xx-xx-9] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-01-19T15:35:27,588][WARN ][o.e.n.Node               ] [ip-172-xx-xx-9] timed out while waiting for initial discovery state - timeout: 30s
[2017-01-19T15:35:27,610][INFO ][o.e.h.HttpServer         ] [ip-172-xx-xx-9] publish_address {172.xx.xx.9:9200}, bound_addresses {172.xx.xx.9:9200}
[2017-01-19T15:35:27,610][INFO ][o.e.n.Node               ] [ip-172-xx-xx-9] started

Current elasticsearch.yml:

[root@ip-172-xx-xx-9 elasticsearch-5.1.1]# cat /etc/elasticsearch/elasticsearch.yml 
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# THIS FILE IS MANAGED BY CHEF, DO NOT EDIT MANUALLY, YOUR CHANGES WILL BE OVERWRITTEN!
#
# Please see the documentation for further information on configuration options:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html> 
#
---
cluster.name: elasticsearch
node.name: ip-172-xx-xx-9
path.conf: "/etc/elasticsearch"
path.data: "/var/lib/elasticsearch"
path.logs: "/var/log/elasticsearch"
cloud.aws.protocol: http
cloud.aws.region: us-west-2
cloud.node.auto_attributes: true
cluster.routing.allocation.awareness.attributes: aws_availability_zone
discovery.ec2.availability_zones: us-west-2a,us-west-2b
discovery.ec2.node_cache_time: 120s
discovery.ec2.tag.name: dev-elasticsearchApp
discovery.type: ec2
discovery.zen.hosts_provider: ec2
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping_timeout: 30s
plugin.mandatory:
- discovery-ec2
- repository-s3
http.port: 9200
transport.tcp.port: 9300
network.host: _ec2_
node.master: true
node.data: false

I did validate that all three nodes are in the same Security Group, same subnet and VPC.

Any other ideas?

Summary

This text will be hidden

can you add _ at the beginning and end of 'eth0:ipv4' and 'local'. I dont know but the editor is skipping the _'s in the display.

So as it turns out, there were a few things wrong, needed to get my tags right, and I had an issue with each of the nodes coming up with the same node ID. Blew away the "nodes" directory in the data directory, restarted and they started coming up.

Here is my working config:

[root@ip-172-xx-xx-61 elasticsearch-5.1.1]# cat /etc/elasticsearch/elasticsearch.yml 
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# THIS FILE IS MANAGED BY CHEF, DO NOT EDIT MANUALLY, YOUR CHANGES WILL BE OVERWRITTEN!
#
# Please see the documentation for further information on configuration options:
# <http://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration.html>
#
---
cluster.name: elasticsearch
node.name: ip-172-xx-xx-61
path.conf: "/etc/elasticsearch"
path.data: "/var/lib/elasticsearch"
path.logs: "/var/log/elasticsearch"
cloud.aws.protocol: https
cloud.aws.region: us-west-2
cloud.node.auto_attributes: true
cluster.routing.allocation.awareness.attributes: aws_availability_zone
discovery.ec2.availability_zones: us-west-2a,us-west-2b
discovery.ec2.node_cache_time: 120s
discovery.ec2.tag.es_cluster: dev-elasticsearch
discovery.type: ec2
discovery.zen.hosts_provider: ec2
discovery.zen.join_timeout: 90s
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping_timeout: 30s
http.port: 9200
transport.tcp.port: 9300
network.host:
- _eth0:ipv4_
- _local_
network.bind_host: _eth0:ipv4_
network.publish_host: _eth0:ipv4_
node.master: true
node.data: true
plugin.mandatory:
- discovery-ec2
- repository-s3

I really appreciate your help! :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.