I have inherited the cluster so I am trying to troubleshoot why the cluster isn't working,I at least want to report to my infrastructure team what should work.
It seems that node is not connecter to the cluster. There is no master available. Your local node cannot contact masters listed in the discovery.zen.ping.unicast.hosts ? If this is the case you should read this doc:
Thanks. I had a look and there were no logs. I restarted the service and saw some errors while I was restarting (in syslog)
I corrected that and I now have some logs (good start)
{#zen_unicast_6_A_ZhFv6mT3i65uDeJUdyjA#}{10.197.163.236}{XX.XXX.XXX.XXX:9300}{master=true}]
Oct 25 10:38:37 hcukazprocatap03 elasticsearch[28877]: [2018-10-25 10:38:37,007][WARN ][transport.netty ] [es-client-01] exception caught on transport layer [[id: 0x5a1fc5a8]], closing connection
Oct 25 10:38:37 hcukazprocatap03 elasticsearch[28877]: java.net.NoRouteToHostException: No route to host
I get a No route to host message, and I presume that relate to this ip: XX.XXX.XXX.XXX:9300
When I telnet to that (telnet XX.XXX.XXX.XXX 9300) I get a connection refused.
Could you confirm that I am approaching this the right way (telnet). My current thoughts are that the port is being blocked.
At the moment I am in the position where I need to tell our infrastructure team what the issue is
Yes, this sounds like connectivity issues. telnet is a reasonable way to test basic connectivity to an Elasticsearch node's transport port (which defaults to 9300). If you manage to establish a connection, hitting <Enter> a few times should close the connection and yield the following sort of log messages on Elasticsearch's side which lets you see that you've actually connected to Elasticsearch and not to something else.
[2018-10-25T18:44:23,948][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [p6N7aBv] exception caught on transport layer [NettyTcpChannel{localAddress=/0:0:0:0:0:0:0:1:9300, remoteAddress=/0:0:0:0:0:0:0:1:53900}], closing connection
io.netty.handler.codec.DecoderException: java.io.StreamCorruptedException: invalid internal transport message format, got (d,a,d,a)
Having looked into this further it looked like the Elasticsearch service on the other boxes had stopped. I don't know why, I need to add some extra monitoring onto that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.