Cluster Setup 3 Node Cluster problem

[2019-07-02T11:29:42,716][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered ; discovery will continue using ~[10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{eqCAFzP9QPGkDd8lfDgZXA}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52~

it says it is not able to communicate with the server mentioned in hosts .check if they are able to communicate with each other using ping and telnet on port 9200 and 9300

Is this the only message on your logs? I think there will also be some error messages, including stack traces, explaining the issue, although you might have to wait a few minutes for them to appear.

Okay I checked both of those ports 9300 isn't even listed, but 9200 has multiple working connections.

I will keep an eye on anymore errors that come through so far it has just been the same error over and over again.

I think your problem is that you're using port 9200 in discovery.seed_hosts rather than 9300, but I am surprised that this is not yielding more error messages.

It just gave me this one.

Caused by: java.io.StreamCorruptedException: invalid internal transport message format, got (48,54,54,50)
at org.elasticsearch.transport.TcpTransport.readHeaderBuffer(TcpTransport.java:841) ~[elasticsearch-7.1.1.jar:7.1.1]

It's always a good idea to share the whole stack trace. It's normally quite hard to help from just a line or two.

In this case, that error is consistent with what I said above.

1 Like

Would you suggest me continuing this path of a cluster of nodes each node running all three of the services or would you suggest doing it the way the elastic website says I should have this setup? As in each node has one service and I make the cluster out of that.

@thev0yager
as david said if you post your config file elasticsearch.yml and more log entry someone will be able to help you.
This seems simple config problem.

cluster.name: choice-cluster
#node.name: corp-elk02
node.master: true
node.data: true
network.host: 10.101.0.141
http.port: 9300
discovery.seed_hosts: ["10.101.0.140:9300","10.101.0.141:9300"]
cluster.initial_master_nodes: ["10.101.0.140:9300","10.101.0.141:9300"]
xpack.security.enabled: false

My other node has an identical yml file accept the node name is different.

This is inconsistent with the messages you shared above. The message said this:

... discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers...

however your config now says this:

discovery.seed_hosts: ["10.101.0.140:9300","10.101.0.141:9300"]

Note the different port numbers.

If you've updated your config you should now be getting different messages. Did you update your config? What is Elasticsearch saying now? Can you share all the logs emitted for the first couple of minutes after startup, from all of your nodes?

Also your original post was about three nodes, but you seem to be talking about two nodes now. How many nodes are there?

Okay I definitely need to set the story straight I am really sorry about all the confusion. I am trying to get to the point where I have a three node cluster. As of now I have had trouble with getting that 3rd node setup. A lot of unrelated issues at the moment. Also between the time I submitted this I did what you said and changed the port number to 9300. I will be submitting the logs that I am now getting from Elasticsearch in a second. Once I get a second node joined up in the is cluster I'll try and add the third node. Btw thank you so much for your help and sorry for the trouble.

[2019-07-03T12:51:47,945][INFO ][o.e.n.Node               ] [choice-node1]     initialized
[2019-07-03T12:51:47,949][INFO ][o.e.n.Node               ] [choice-node1] starting ...
[2019-07-03T12:51:48,155][INFO ][o.e.t.TransportService   ] [choice-node1] publish_address {10.101.0.140:9300}, bound_addresses {10.101.0.140:9300}
[2019-07-03T12:51:48,167][INFO ][o.e.b.BootstrapChecks    ] [choice-node1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-07-03T12:51:48,178][INFO ][o.e.c.c.Coordinator      ] [choice-node1] cluster UUID [TUPCVI9iQBK5p3Erp3L5ew]
[2019-07-03T12:51:58,215][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{tQOZJeltRSiCHM7Osi0wnA}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T12:52:08,216][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{tQOZJeltRSiCHM7Osi0wnA}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T12:52:18,219][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{tQOZJeltRSiCHM7Osi0wnA}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T12:52:18,248][WARN ][o.e.n.Node               ] [choice-node1] timed out while waiting for initial discovery state - timeout: 30s
[2019-07-03T12:52:18,263][INFO ][o.e.h.AbstractHttpServerTransport] [choice-node1] publish_address {10.101.0.140:9200}, bound_addresses {10.101.0.140:9200}
[2019-07-03T12:52:18,264][INFO ][o.e.n.Node               ] [choice-node1] started
[2019-07-03T12:52:28,220][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{tQOZJeltRSiCHM7Osi0wnA}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T12:52:37,248][WARN ][o.e.t.TcpTransport       ] [choice-node1] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.101.0.140:58134, remoteAddress=/10.101.0.140:9200}], closing connection
io.netty.handler.codec.DecoderException: java.io.StreamCorruptedException: invalid internal transport message format, got (48,54,54,50)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:472) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
	at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.io.StreamCorruptedException: invalid internal transport message format, got (48,54,54,50)
	at org.elasticsearch.transport.TcpTransport.readHeaderBuffer(TcpTransport.java:841) ~[elasticsearch-7.1.1.jar:7.1.1]
	at org.elasticsearch.transport.TcpTransport.readMessageLength(TcpTransport.java:827) ~[elasticsearch-7.1.1.jar:7.1.1]
	at org.elasticsearch.transport.netty4.Netty4SizeHeaderFrameDecoder.decode(Netty4SizeHeaderFrameDecoder.java:40) ~[transport-netty4-client-7.1.1.jar:7.1.1

This is one of the nodes, and this is from the point where it starts up to a "caused by" line that I am not sure what it means.

This is the yml file for the elasticsearch node 2

cluster.name: choice-cluster
#node.name: corp-elk02
node.master: true
node.data: true
network.host: 10.101.0.141
http.port: 9300
discovery.seed_hosts: ["10.101.0.140:9300","10.101.0.141:9300"]
cluster.initial_master_nodes: ["10.101.0.140:9300","10.101.0.141:9300"]
xpack.security.enabled: false
[2019-07-03T18:04:13,854][INFO ][o.e.n.Node               ] [corp-elk02] initialized
[2019-07-03T18:04:13,854][INFO ][o.e.n.Node               ] [corp-elk02] starting ...
[2019-07-03T18:04:14,112][INFO ][o.e.t.TransportService   ] [corp-elk02] publish_address {10.101.0.141:9300}, bound_addresses {10.101.0.141:9300}
[2019-07-03T18:04:14,123][INFO ][o.e.b.BootstrapChecks    ] [corp-elk02] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-07-03T18:04:14,135][INFO ][o.e.c.c.Coordinator      ] [corp-elk02] cluster UUID [u4y2xk7mSv2x2_a8l9k2ew]
[2019-07-03T18:04:14,317][INFO ][o.e.c.s.MasterService    ] [corp-elk02] elected-as-master ([1] nodes joined)[{corp-elk02}{PpIwvyyXREKSn0HscUCmag}{kPjbrOASTQiDOAQMcomSaQ}{10.101.0.141}{10.101.0.141:9300}{ml.machine_memory=16820158464, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 11, version: 55, reason: master node changed {previous [], current [{corp-elk02}{PpIwvyyXREKSn0HscUCmag}{kPjbrOASTQiDOAQMcomSaQ}{10.101.0.141}{10.101.0.141:9300}{ml.machine_memory=16820158464, xpack.installed=true, ml.max_open_jobs=20}]}
[2019-07-03T18:04:14,886][INFO ][o.e.c.s.ClusterApplierService] [corp-elk02] master node changed {previous [], current [{corp-elk02}{PpIwvyyXREKSn0HscUCmag}{kPjbrOASTQiDOAQMcomSaQ}{10.101.0.141}{10.101.0.141:9300}{ml.machine_memory=16820158464, xpack.installed=true, ml.max_open_jobs=20}]}, term: 11, version: 55, reason: Publication{term=11, version=55}
[2019-07-03T18:04:15,049][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [corp-elk02] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: BindHttpException[Failed to bind to [9300]]; nested: BindException[Address already in use];

It pulls the kill switch after that.

HTTP port should still be 9200.

Should that be the same on the other nodes to?