Cloud-aws plugin not able to join cluster

Chris_Broll · March 17, 2016, 11:25am

So i have been attempting to use the ElasticSearch "cloud-aws" plugin to join elasticsearch nodes to my single master. I have been though a few online guides and tried a few settings from various sources but I still can't get the new nodes to join the existing master.

I have configured IAMS roles and tags for EC2 and this is my elasticsearch.yml file on one node (the others are similar):

node.name: Thor
node.client: "true"
network.host: localhost
cloud:aws:access_key: foobar
cloud:aws:secret_key: barfoo
cloud:aws:region: eu-west-1
discovery:type: ec2
discovery.ec2.tag.elasticsearch: Ubuntu-ElasticNode

The logging from elasticsearch is not very helpful and even in DEBUG mode not much is offered up.

[2016-03-15 23:01:05,440][INFO ][node                     ] [Thor] version[2.2.0], pid[1550], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-03-15 23:01:05,447][INFO ][node                     ] [Thor] initializing ...
[2016-03-15 23:01:06,685][INFO ][plugins                  ] [Thor] modules     [lang-expression, lang-groovy], plugins [cloud-aws], sites []
[2016-03-15 23:01:10,016][INFO ][node                     ] [Thor] initialized
[2016-03-15 23:01:10,017][INFO ][node                     ] [Thor] starting ...
[2016-03-15 23:01:10,106][INFO ][transport                ] [Thor] publish_address {localhost/127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[2016-03-15 23:01:10,115][INFO ][discovery                ] [Thor]   elasticsearch/9PmYq5tXQcaPUPqDh4VTSQ
[2016-03-15 23:01:40,116][WARN ][discovery                ] [Thor] waited for 30s and no initial state was set by the discovery
[2016-03-15 23:01:40,155][INFO ][http                     ] [Thor] publish_address {localhost/127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
[2016-03-15 23:01:40,155][INFO ][node                     ] [Thor] started
[2016-03-15 23:54:40,863][DEBUG][action.admin.cluster.health] [Thor] no known master node, scheduling a retry
[2016-03-15 23:55:10,864][DEBUG][action.admin.cluster.health] [Thor] timed out while retrying    [cluster:monitor/health] after failure (timeout [30s])
[2016-03-15 23:55:10,874][INFO ][rest.suppressed          ] /_cluster/health  Params: {pretty=}
MasterNotDiscoveredException[null]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:205)
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:239)
at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:794)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I have the port range 9200 - 9400 open between the elasticsearch servers but the log seems to indicate that the discovery is still timing out. I set "discovery.ec2.tag.*" to speed up the discovery process but this hasn't helped.

Does anyone have any idea how this plugin needs to be configured? I have read a few guides that use even less configuration options than I have and are still able to join nodes to the master.

dadoonet · March 17, 2016, 9:23pm

I believe you need to change network.host. Your nodes won't be able to communicate with other instances otherwise.

Read this: https://www.elastic.co/guide/en/elasticsearch/plugins/current/cloud-aws-discovery.html#cloud-aws-discovery-network-host

Chris_Broll · March 18, 2016, 1:03am

Thanks, so i updated the elasticsearch.yml on the node to:

cluster.name: elasticsearch
node.name: Supergirl
node.client: "true"
network.host: _ec2_
cloud.aws.access_key: key
cloud.aws.secret_key: secret
cloud.aws.region: eu-west-1
discovery.type: ec2
discovery.ec2.tag.elasticsearch: Ubuntu-ElasticNode
discovery.zen.minimum_master_nodes: 2

Then I updated the elasticesearch.yml on the master to:

plugin.mandatory: cloud-aws
cluster.name: elasticsearch
network.host: ec2
cloud.aws.access_key: key
cloud.aws.secret_key: secret
cloud.aws.region: eu-west-1
discovery.type: ec2
discovery.ec2.tag.elasticsearch: Ubuntu-ElasticNode

The node joined the master but then i noticed that nginx, kibana and logstash were broken on the master. They had references to localhost which I updated to the masters ec2 ip in the nginx default file, the kibana.yml and the logstash output config file but now i am getting this error when trying to join the elasticsearch node to the master:

[2016-03-18 00:48:54,378][INFO ][discovery.ec2            ] [Supergirl] failed to send join request to master [{In-Betweener}.........
........
java.lang.IllegalStateException: Message not fully read (request) for requestId [4420], action [internal:discovery/zen/join/validate], readerIndex [44918] vs expected [54082]; resetting

dadoonet · March 18, 2016, 4:00am

You have a Node or Java Client which probably is not using the elasticsearch version.

What are your LS version and config file?w

Chris_Broll · March 18, 2016, 9:18am

I suspected that i had different versions of elasticsearch because i remember holding the master at 2.2.0 but the new node is running 2.2.1.

My Logstash version is:

 logstash:
  Installed: 1:2.2.2-1

The config file looks like this:

/etc/logstash/conf.d/30-elasticsearch-output.conf

output {
  elasticsearch {
    hosts => ["172.31.31.4:9200"]
    sniffing => true
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

dadoonet · March 18, 2016, 9:58am

So LS is fine here. You are using the HTTP connector which is perfect.

Mixing 2.2.0 with 2.2.1 should not be an issue and actually whatever 2.x version it is.

Unsure where this is coming from...

Chris_Broll · March 18, 2016, 10:28am

Interesting, not sure if this will help anyone else but this is the stack trace on the node:

[2016-03-18 10:20:36,078][INFO ][discovery.ec2            ] [Supergirl] failed to send join request to master      [{George Washington Bridge}{txAxO29VSoiIMu0VKmvd4g}{172.31.31.4}{172.31.31.4:9300}], reason     [RemoteTransportException[[George Washington Bridge][172.31.31.4:9300][internal:discovery/zen/join]];     nested: IllegalStateException[failure when sending a validation request to node]; nested:   RemoteTransportException[[Supergirl][172.31.28.115:9300][internal:discovery/zen/join/validate]]; nested:   IllegalArgumentException[No custom metadata prototype registered for type [licenses], node like missing  plugins]; ]
[2016-03-18 10:20:39,099][WARN ][transport.netty          ] [Supergirl] exception caught on transport layer [[id:  0x3e11e0a0, /172.31.31.4:45449 => /172.31.28.115:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (request) for requestId [5953], action    [ internal:discovery/zen/join/validate], readerIndex [45509] vs expected [54685]; resetting
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:120)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

This is the output showing in the elasticsearch log on the master:

[2016-03-18 10:22:55,932][WARN ][discovery.ec2            ] [George Washington Bridge] failed to validate incoming join request from node [{Supergirl}{hrse7OHMTK-aAJGl3CX8Ew}{172.31.28.115}{172.31.28.115:9300}{client=true, data=false}]

I might try spinning up a new node in AWS later to see if a clean install helps.

dadoonet · March 18, 2016, 10:41am

Which plugins did you install? Are you missing one?

Chris_Broll · March 18, 2016, 10:42am

On the master i have clould-aws and marvel and on the node just cloud-aws.

dadoonet · March 18, 2016, 11:13am

why you don't have marvel on data nodes?

I think that at the very least you need to install the license plugin on each node.

Chris_Broll · March 18, 2016, 11:30am

I just haven't got to the marvel plugin yet. I wanted to get cloud-aws working first.

I don't remember installing a license plugin on the master but I can see it on the master. What is the license plugins purpose and where do i get it? I seem to get directed to a Shield plugin when i 'google' it.

Chris_Broll · March 18, 2016, 11:36am

Okay, i found the details on the license plugin in the Marvel docs and installed both but not sure if they will need any further configuration.

Chris_Broll · March 18, 2016, 11:55am

So I built a new node and this time installed Java 7 (instead of downgrading from 8 to 7 as per the last node), installed marvel-agent, license and cloud-aws and this was the result:

[2016-03-18 11:41:45,410][INFO ][node                     ] [Supergirl Returns] version[2.2.0], pid[1265], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-03-18 11:41:45,411][INFO ][node                     ] [Supergirl Returns] initializing ...
[2016-03-18 11:41:46,738][INFO ][plugins                  ] [Supergirl Returns] modules [lang-groovy, lang-expression], plugins [marvel-agent, cloud-aws, license], sites []
[2016-03-18 11:41:49,807][INFO ][node                     ] [Supergirl Returns] initialized
[2016-03-18 11:41:49,807][INFO ][node                     ] [Supergirl Returns] starting ...
[2016-03-18 11:41:49,910][INFO ][transport                ] [Supergirl Returns] publish_address {172.31.23.200:9300}, bound_addresses {172.31.23.200:9300}
[2016-03-18 11:41:49,919][INFO ][discovery                ] [Supergirl Returns] elasticsearch/XQz2-7mlQYK64RWp5fGrcA
[2016-03-18 11:41:53,957][INFO ][cluster.service          ] [Supergirl Returns] detected_master {George Washington Bridge}{txAxO29VSoiIMu0VKmvd4g}{172.31.31.4}{172.31.31.4:9300}, added {{George Washington Bridge}{txAxO29VSoiIMu0VKmvd4g}{172.31.31.4}{172.31.31.4:9300},}, reason: zen-disco-receive(from master  [{George Washington Bridge}{txAxO29VSoiIMu0VKmvd4g}{172.31.31.4}{172.31.31.4:9300}])
[2016-03-18 11:41:53,999][INFO ][license.plugin.core      ] [Supergirl Returns] license [d07edc1f-a44b-4201-b5b6-5d377d397c4c] - valid
[2016-03-18 11:41:54,001][ERROR][license.plugin.core      ] [Supergirl Returns]
#
# License will expire on [Monday, April 11, 2016]. If you have a new license, please update it.
# Otherwise, please reach out to your support contact.
#
# Commercial plugins operate with reduced functionality on license expiration:
# - marvel
#  - The agent will stop collecting cluster and indices metrics
[2016-03-18 11:41:54,008][INFO ][http                     ] [Supergirl Returns] publish_address {172.31.23.200:9200}, bound_addresses {172.31.23.200:9200}
[2016-03-18 11:41:54,009][INFO ][node                     ] [Supergirl Returns] started

So 'Supergirl Returns' has joined the cluster and i can see this from both Marvel and curl cluster health but I don't see any replication yet but I need to go out so i will check the health when i get back.

Thank you for your invaluable input and assistance @dadoonet.

Topic		Replies	Views
Can't get Nodes to join AWS cluster Elasticsearch	3	579	July 6, 2017
Can't get Java client to join cloud-aws cluster Elasticsearch	6	311	July 6, 2017
Nodes do not find the initial master node Elasticsearch docker	12	246	July 8, 2024
ES cluster with Docker in AWS env Elasticsearch	1	599	December 16, 2016
ES and AWS Cloud plugin autodiscovery not working Elasticsearch	3	408	July 6, 2017

Cloud-aws plugin not able to join cluster

Related topics