So i have been attempting to use the ElasticSearch "cloud-aws" plugin to join elasticsearch nodes to my single master. I have been though a few online guides and tried a few settings from various sources but I still can't get the new nodes to join the existing master.
I have configured IAMS roles and tags for EC2 and this is my elasticsearch.yml file on one node (the others are similar):
The logging from elasticsearch is not very helpful and even in DEBUG mode not much is offered up.
[2016-03-15 23:01:05,440][INFO ][node ] [Thor] version[2.2.0], pid[1550], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-03-15 23:01:05,447][INFO ][node ] [Thor] initializing ...
[2016-03-15 23:01:06,685][INFO ][plugins ] [Thor] modules [lang-expression, lang-groovy], plugins [cloud-aws], sites []
[2016-03-15 23:01:10,016][INFO ][node ] [Thor] initialized
[2016-03-15 23:01:10,017][INFO ][node ] [Thor] starting ...
[2016-03-15 23:01:10,106][INFO ][transport ] [Thor] publish_address {localhost/127.0.0.1:9300}, bound_addresses {127.0.0.1:9300}
[2016-03-15 23:01:10,115][INFO ][discovery ] [Thor] elasticsearch/9PmYq5tXQcaPUPqDh4VTSQ
[2016-03-15 23:01:40,116][WARN ][discovery ] [Thor] waited for 30s and no initial state was set by the discovery
[2016-03-15 23:01:40,155][INFO ][http ] [Thor] publish_address {localhost/127.0.0.1:9200}, bound_addresses {127.0.0.1:9200}
[2016-03-15 23:01:40,155][INFO ][node ] [Thor] started
[2016-03-15 23:54:40,863][DEBUG][action.admin.cluster.health] [Thor] no known master node, scheduling a retry
[2016-03-15 23:55:10,864][DEBUG][action.admin.cluster.health] [Thor] timed out while retrying [cluster:monitor/health] after failure (timeout [30s])
[2016-03-15 23:55:10,874][INFO ][rest.suppressed ] /_cluster/health Params: {pretty=}
MasterNotDiscoveredException[null]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$AsyncSingleAction$5.onTimeout(TransportMasterNodeAction.java:205)
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:239)
at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:794)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I have the port range 9200 - 9400 open between the elasticsearch servers but the log seems to indicate that the discovery is still timing out. I set "discovery.ec2.tag.*" to speed up the discovery process but this hasn't helped.
Does anyone have any idea how this plugin needs to be configured? I have read a few guides that use even less configuration options than I have and are still able to join nodes to the master.
The node joined the master but then i noticed that nginx, kibana and logstash were broken on the master. They had references to localhost which I updated to the masters ec2 ip in the nginx default file, the kibana.yml and the logstash output config file but now i am getting this error when trying to join the elasticsearch node to the master:
[2016-03-18 00:48:54,378][INFO ][discovery.ec2 ] [Supergirl] failed to send join request to master [{In-Betweener}.........
........
java.lang.IllegalStateException: Message not fully read (request) for requestId [4420], action [internal:discovery/zen/join/validate], readerIndex [44918] vs expected [54082]; resetting
Interesting, not sure if this will help anyone else but this is the stack trace on the node:
[2016-03-18 10:20:36,078][INFO ][discovery.ec2 ] [Supergirl] failed to send join request to master [{George Washington Bridge}{txAxO29VSoiIMu0VKmvd4g}{172.31.31.4}{172.31.31.4:9300}], reason [RemoteTransportException[[George Washington Bridge][172.31.31.4:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[Supergirl][172.31.28.115:9300][internal:discovery/zen/join/validate]]; nested: IllegalArgumentException[No custom metadata prototype registered for type [licenses], node like missing plugins]; ]
[2016-03-18 10:20:39,099][WARN ][transport.netty ] [Supergirl] exception caught on transport layer [[id: 0x3e11e0a0, /172.31.31.4:45449 => /172.31.28.115:9300]], closing connection
java.lang.IllegalStateException: Message not fully read (request) for requestId [5953], action [ internal:discovery/zen/join/validate], readerIndex [45509] vs expected [54685]; resetting
at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:120)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
This is the output showing in the elasticsearch log on the master:
[2016-03-18 10:22:55,932][WARN ][discovery.ec2 ] [George Washington Bridge] failed to validate incoming join request from node [{Supergirl}{hrse7OHMTK-aAJGl3CX8Ew}{172.31.28.115}{172.31.28.115:9300}{client=true, data=false}]
I might try spinning up a new node in AWS later to see if a clean install helps.
I just haven't got to the marvel plugin yet. I wanted to get cloud-aws working first.
I don't remember installing a license plugin on the master but I can see it on the master. What is the license plugins purpose and where do i get it? I seem to get directed to a Shield plugin when i 'google' it.
So I built a new node and this time installed Java 7 (instead of downgrading from 8 to 7 as per the last node), installed marvel-agent, license and cloud-aws and this was the result:
[2016-03-18 11:41:45,410][INFO ][node ] [Supergirl Returns] version[2.2.0], pid[1265], build[8ff36d1/2016-01-27T13:32:39Z]
[2016-03-18 11:41:45,411][INFO ][node ] [Supergirl Returns] initializing ...
[2016-03-18 11:41:46,738][INFO ][plugins ] [Supergirl Returns] modules [lang-groovy, lang-expression], plugins [marvel-agent, cloud-aws, license], sites []
[2016-03-18 11:41:49,807][INFO ][node ] [Supergirl Returns] initialized
[2016-03-18 11:41:49,807][INFO ][node ] [Supergirl Returns] starting ...
[2016-03-18 11:41:49,910][INFO ][transport ] [Supergirl Returns] publish_address {172.31.23.200:9300}, bound_addresses {172.31.23.200:9300}
[2016-03-18 11:41:49,919][INFO ][discovery ] [Supergirl Returns] elasticsearch/XQz2-7mlQYK64RWp5fGrcA
[2016-03-18 11:41:53,957][INFO ][cluster.service ] [Supergirl Returns] detected_master {George Washington Bridge}{txAxO29VSoiIMu0VKmvd4g}{172.31.31.4}{172.31.31.4:9300}, added {{George Washington Bridge}{txAxO29VSoiIMu0VKmvd4g}{172.31.31.4}{172.31.31.4:9300},}, reason: zen-disco-receive(from master [{George Washington Bridge}{txAxO29VSoiIMu0VKmvd4g}{172.31.31.4}{172.31.31.4:9300}])
[2016-03-18 11:41:53,999][INFO ][license.plugin.core ] [Supergirl Returns] license [d07edc1f-a44b-4201-b5b6-5d377d397c4c] - valid
[2016-03-18 11:41:54,001][ERROR][license.plugin.core ] [Supergirl Returns]
#
# License will expire on [Monday, April 11, 2016]. If you have a new license, please update it.
# Otherwise, please reach out to your support contact.
#
# Commercial plugins operate with reduced functionality on license expiration:
# - marvel
# - The agent will stop collecting cluster and indices metrics
[2016-03-18 11:41:54,008][INFO ][http ] [Supergirl Returns] publish_address {172.31.23.200:9200}, bound_addresses {172.31.23.200:9200}
[2016-03-18 11:41:54,009][INFO ][node ] [Supergirl Returns] started
So 'Supergirl Returns' has joined the cluster and i can see this from both Marvel and curl cluster health but I don't see any replication yet but I need to go out so i will check the health when i get back.
Thank you for your invaluable input and assistance @dadoonet.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.