Cluster Setup 3 Node Cluster problem

Okay I definitely need to set the story straight I am really sorry about all the confusion. I am trying to get to the point where I have a three node cluster. As of now I have had trouble with getting that 3rd node setup. A lot of unrelated issues at the moment. Also between the time I submitted this I did what you said and changed the port number to 9300. I will be submitting the logs that I am now getting from Elasticsearch in a second. Once I get a second node joined up in the is cluster I'll try and add the third node. Btw thank you so much for your help and sorry for the trouble.

[2019-07-03T12:51:47,945][INFO ][o.e.n.Node               ] [choice-node1]     initialized
[2019-07-03T12:51:47,949][INFO ][o.e.n.Node               ] [choice-node1] starting ...
[2019-07-03T12:51:48,155][INFO ][o.e.t.TransportService   ] [choice-node1] publish_address {10.101.0.140:9300}, bound_addresses {10.101.0.140:9300}
[2019-07-03T12:51:48,167][INFO ][o.e.b.BootstrapChecks    ] [choice-node1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-07-03T12:51:48,178][INFO ][o.e.c.c.Coordinator      ] [choice-node1] cluster UUID [TUPCVI9iQBK5p3Erp3L5ew]
[2019-07-03T12:51:58,215][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{tQOZJeltRSiCHM7Osi0wnA}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T12:52:08,216][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{tQOZJeltRSiCHM7Osi0wnA}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T12:52:18,219][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{tQOZJeltRSiCHM7Osi0wnA}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T12:52:18,248][WARN ][o.e.n.Node               ] [choice-node1] timed out while waiting for initial discovery state - timeout: 30s
[2019-07-03T12:52:18,263][INFO ][o.e.h.AbstractHttpServerTransport] [choice-node1] publish_address {10.101.0.140:9200}, bound_addresses {10.101.0.140:9200}
[2019-07-03T12:52:18,264][INFO ][o.e.n.Node               ] [choice-node1] started
[2019-07-03T12:52:28,220][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{tQOZJeltRSiCHM7Osi0wnA}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T12:52:37,248][WARN ][o.e.t.TcpTransport       ] [choice-node1] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.101.0.140:58134, remoteAddress=/10.101.0.140:9200}], closing connection
io.netty.handler.codec.DecoderException: java.io.StreamCorruptedException: invalid internal transport message format, got (48,54,54,50)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:472) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
	at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.io.StreamCorruptedException: invalid internal transport message format, got (48,54,54,50)
	at org.elasticsearch.transport.TcpTransport.readHeaderBuffer(TcpTransport.java:841) ~[elasticsearch-7.1.1.jar:7.1.1]
	at org.elasticsearch.transport.TcpTransport.readMessageLength(TcpTransport.java:827) ~[elasticsearch-7.1.1.jar:7.1.1]
	at org.elasticsearch.transport.netty4.Netty4SizeHeaderFrameDecoder.decode(Netty4SizeHeaderFrameDecoder.java:40) ~[transport-netty4-client-7.1.1.jar:7.1.1

This is one of the nodes, and this is from the point where it starts up to a "caused by" line that I am not sure what it means.

This is the yml file for the elasticsearch node 2

cluster.name: choice-cluster
#node.name: corp-elk02
node.master: true
node.data: true
network.host: 10.101.0.141
http.port: 9300
discovery.seed_hosts: ["10.101.0.140:9300","10.101.0.141:9300"]
cluster.initial_master_nodes: ["10.101.0.140:9300","10.101.0.141:9300"]
xpack.security.enabled: false
[2019-07-03T18:04:13,854][INFO ][o.e.n.Node               ] [corp-elk02] initialized
[2019-07-03T18:04:13,854][INFO ][o.e.n.Node               ] [corp-elk02] starting ...
[2019-07-03T18:04:14,112][INFO ][o.e.t.TransportService   ] [corp-elk02] publish_address {10.101.0.141:9300}, bound_addresses {10.101.0.141:9300}
[2019-07-03T18:04:14,123][INFO ][o.e.b.BootstrapChecks    ] [corp-elk02] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-07-03T18:04:14,135][INFO ][o.e.c.c.Coordinator      ] [corp-elk02] cluster UUID [u4y2xk7mSv2x2_a8l9k2ew]
[2019-07-03T18:04:14,317][INFO ][o.e.c.s.MasterService    ] [corp-elk02] elected-as-master ([1] nodes joined)[{corp-elk02}{PpIwvyyXREKSn0HscUCmag}{kPjbrOASTQiDOAQMcomSaQ}{10.101.0.141}{10.101.0.141:9300}{ml.machine_memory=16820158464, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 11, version: 55, reason: master node changed {previous [], current [{corp-elk02}{PpIwvyyXREKSn0HscUCmag}{kPjbrOASTQiDOAQMcomSaQ}{10.101.0.141}{10.101.0.141:9300}{ml.machine_memory=16820158464, xpack.installed=true, ml.max_open_jobs=20}]}
[2019-07-03T18:04:14,886][INFO ][o.e.c.s.ClusterApplierService] [corp-elk02] master node changed {previous [], current [{corp-elk02}{PpIwvyyXREKSn0HscUCmag}{kPjbrOASTQiDOAQMcomSaQ}{10.101.0.141}{10.101.0.141:9300}{ml.machine_memory=16820158464, xpack.installed=true, ml.max_open_jobs=20}]}, term: 11, version: 55, reason: Publication{term=11, version=55}
[2019-07-03T18:04:15,049][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [corp-elk02] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: BindHttpException[Failed to bind to [9300]]; nested: BindException[Address already in use];

It pulls the kill switch after that.

HTTP port should still be 9200.

Should that be the same on the other nodes to?

Elasticsearch internally communicates on port 9300, while clients connect to port 9200 via HTTP. The seed hosts therefore look correct.

1 Like

Okay I changed the port to 9200 and one of the nodes worked the other one now has.

2019-07-03T13:19:36,620][INFO ][o.e.n.Node               ] [choice-node1] initialized
[2019-07-03T13:19:36,620][INFO ][o.e.n.Node               ] [choice-node1] starting ...
[2019-07-03T13:19:36,841][INFO ][o.e.t.TransportService   ] [choice-node1] publish_address {10.101.0.140:9300}, bound_addresses {10.101.0.140:9300}
[2019-07-03T13:19:36,877][INFO ][o.e.b.BootstrapChecks    ] [choice-node1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-07-03T13:19:36,912][INFO ][o.e.c.c.Coordinator      ] [choice-node1] cluster UUID [TUPCVI9iQBK5p3Erp3L5ew]
[2019-07-03T13:19:46,936][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{5Rl6IYxaQkaeT1KAh27QZQ}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T13:19:54,950][WARN ][o.e.t.TcpTransport       ] [choice-node1] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.101.0.140:59916, remoteAddress=/10.101.0.141:9200}], closing connection
io.netty.handler.codec.DecoderException: java.io.StreamCorruptedException: invalid internal transport message format, got (48,54,54,50)

This is part of the log I got the other part.

Caused by: java.io.StreamCorruptedException: invalid internal transport message format, got (48,54,54,50)
	at org.elasticsearch.transport.TcpTransport.readHeaderBuffer(TcpTransport.java:841) ~[elasticsearch-7.1.1.jar:7.1.1]
	at org.elasticsearch.transport.TcpTransport.readMessageLength(TcpTransport.java:827) ~[elasticsearch-7.1.1.jar:7.1.1]
	at org.elasticsearch.transport.netty4.Netty4SizeHeaderFrameDecoder.decode(Netty4SizeHeaderFrameDecoder.java:40) ~[transport-netty4-client-7.1.1.jar:7.1.1]

I feel like this shouldn't work right? The way I am setting this up is way different from what the website says. Should this work?

If you remove ALL port numbers it should use the defaults, which should work. It seems dome nodes still have incorrect port numbers.

Oh heck I changed the the port after I posted the yml my bad.

Should I comment out the "http.port: 9200" line or should that be left in? If so I still got the same error "master not discovered yet"

Yes, you should not be setting http.port, so delete that line.

Which website? How are you doing something different? Most settings have sensible defaults, so the fewer settings you're changing the better.

I've lost track of what state we're in now that you've fixed the port (and maybe something else). Can you share your current configs and the corresponding logs? From both the nodes please.

Absolutely! I have been looking at the elastic website on how to setup a cluster. I have pieced together what the has told me to make what I have now since what they have suggested is different from what my CTO wants me to do.

Node 2

cluster.name: choice-cluster
#node.name: corp-elk02
node.master: true
node.data: true
network.host: 10.101.0.141
discovery.seed_hosts: ["10.101.0.140","10.101.0.141"]
cluster.initial_master_nodes: ["10.101.0.140","10.101.0.141"]
xpack.security.enabled: false

Node 1

cluster.name: choice-cluster
#node.name: corp-elk01
node.master: true
node.data: true
network.host: 10.101.0.140
discovery.seed_hosts: ["10.101.0.140","10.101.0.141"]
cluster.initial_master_nodes: ["10.101.0.140","10.101.0.141"]
xpack.security.enabled: false

Ok, not sure why the node.name lines are commented out, but otherwise this looks good. What about the logs? Again, the first couple of minutes after startup from each node please.

[2019-07-03T19:13:40,371][INFO ][o.e.c.r.a.AllocationService] [corp-elk02] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[logstash][0]] ...]).

Okay the cluster is up now to join the other to it

Do you need more logs from the working one?

Yes, logs from both nodes please.

1 Like