Cluster Setup 3 Node Cluster problem

thev0yager · July 3, 2019, 5:51pm

Okay I definitely need to set the story straight I am really sorry about all the confusion. I am trying to get to the point where I have a three node cluster. As of now I have had trouble with getting that 3rd node setup. A lot of unrelated issues at the moment. Also between the time I submitted this I did what you said and changed the port number to 9300. I will be submitting the logs that I am now getting from Elasticsearch in a second. Once I get a second node joined up in the is cluster I'll try and add the third node. Btw thank you so much for your help and sorry for the trouble.

thev0yager · July 3, 2019, 5:55pm

[2019-07-03T12:51:47,945][INFO ][o.e.n.Node               ] [choice-node1]     initialized
[2019-07-03T12:51:47,949][INFO ][o.e.n.Node               ] [choice-node1] starting ...
[2019-07-03T12:51:48,155][INFO ][o.e.t.TransportService   ] [choice-node1] publish_address {10.101.0.140:9300}, bound_addresses {10.101.0.140:9300}
[2019-07-03T12:51:48,167][INFO ][o.e.b.BootstrapChecks    ] [choice-node1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-07-03T12:51:48,178][INFO ][o.e.c.c.Coordinator      ] [choice-node1] cluster UUID [TUPCVI9iQBK5p3Erp3L5ew]
[2019-07-03T12:51:58,215][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{tQOZJeltRSiCHM7Osi0wnA}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T12:52:08,216][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{tQOZJeltRSiCHM7Osi0wnA}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T12:52:18,219][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{tQOZJeltRSiCHM7Osi0wnA}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T12:52:18,248][WARN ][o.e.n.Node               ] [choice-node1] timed out while waiting for initial discovery state - timeout: 30s
[2019-07-03T12:52:18,263][INFO ][o.e.h.AbstractHttpServerTransport] [choice-node1] publish_address {10.101.0.140:9200}, bound_addresses {10.101.0.140:9200}
[2019-07-03T12:52:18,264][INFO ][o.e.n.Node               ] [choice-node1] started
[2019-07-03T12:52:28,220][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{tQOZJeltRSiCHM7Osi0wnA}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T12:52:37,248][WARN ][o.e.t.TcpTransport       ] [choice-node1] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.101.0.140:58134, remoteAddress=/10.101.0.140:9200}], closing connection
io.netty.handler.codec.DecoderException: java.io.StreamCorruptedException: invalid internal transport message format, got (48,54,54,50)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:472) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278) ~[netty-codec-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:556) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:510) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470) [netty-transport-4.1.32.Final.jar:4.1.32.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909) [netty-common-4.1.32.Final.jar:4.1.32.Final]
	at java.lang.Thread.run(Thread.java:835) [?:?]
Caused by: java.io.StreamCorruptedException: invalid internal transport message format, got (48,54,54,50)
	at org.elasticsearch.transport.TcpTransport.readHeaderBuffer(TcpTransport.java:841) ~[elasticsearch-7.1.1.jar:7.1.1]
	at org.elasticsearch.transport.TcpTransport.readMessageLength(TcpTransport.java:827) ~[elasticsearch-7.1.1.jar:7.1.1]
	at org.elasticsearch.transport.netty4.Netty4SizeHeaderFrameDecoder.decode(Netty4SizeHeaderFrameDecoder.java:40) ~[transport-netty4-client-7.1.1.jar:7.1.1

thev0yager · July 3, 2019, 5:57pm

This is one of the nodes, and this is from the point where it starts up to a "caused by" line that I am not sure what it means.

thev0yager · July 3, 2019, 6:01pm

This is the yml file for the elasticsearch node 2

cluster.name: choice-cluster
#node.name: corp-elk02
node.master: true
node.data: true
network.host: 10.101.0.141
http.port: 9300
discovery.seed_hosts: ["10.101.0.140:9300","10.101.0.141:9300"]
cluster.initial_master_nodes: ["10.101.0.140:9300","10.101.0.141:9300"]
xpack.security.enabled: false

thev0yager · July 3, 2019, 6:07pm

[2019-07-03T18:04:13,854][INFO ][o.e.n.Node               ] [corp-elk02] initialized
[2019-07-03T18:04:13,854][INFO ][o.e.n.Node               ] [corp-elk02] starting ...
[2019-07-03T18:04:14,112][INFO ][o.e.t.TransportService   ] [corp-elk02] publish_address {10.101.0.141:9300}, bound_addresses {10.101.0.141:9300}
[2019-07-03T18:04:14,123][INFO ][o.e.b.BootstrapChecks    ] [corp-elk02] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-07-03T18:04:14,135][INFO ][o.e.c.c.Coordinator      ] [corp-elk02] cluster UUID [u4y2xk7mSv2x2_a8l9k2ew]
[2019-07-03T18:04:14,317][INFO ][o.e.c.s.MasterService    ] [corp-elk02] elected-as-master ([1] nodes joined)[{corp-elk02}{PpIwvyyXREKSn0HscUCmag}{kPjbrOASTQiDOAQMcomSaQ}{10.101.0.141}{10.101.0.141:9300}{ml.machine_memory=16820158464, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 11, version: 55, reason: master node changed {previous [], current [{corp-elk02}{PpIwvyyXREKSn0HscUCmag}{kPjbrOASTQiDOAQMcomSaQ}{10.101.0.141}{10.101.0.141:9300}{ml.machine_memory=16820158464, xpack.installed=true, ml.max_open_jobs=20}]}
[2019-07-03T18:04:14,886][INFO ][o.e.c.s.ClusterApplierService] [corp-elk02] master node changed {previous [], current [{corp-elk02}{PpIwvyyXREKSn0HscUCmag}{kPjbrOASTQiDOAQMcomSaQ}{10.101.0.141}{10.101.0.141:9300}{ml.machine_memory=16820158464, xpack.installed=true, ml.max_open_jobs=20}]}, term: 11, version: 55, reason: Publication{term=11, version=55}
[2019-07-03T18:04:15,049][WARN ][o.e.b.ElasticsearchUncaughtExceptionHandler] [corp-elk02] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: BindHttpException[Failed to bind to [9300]]; nested: BindException[Address already in use];

It pulls the kill switch after that.

Christian_Dahlqvist · July 3, 2019, 6:09pm

HTTP port should still be 9200.

thev0yager · July 3, 2019, 6:10pm

Should that be the same on the other nodes to?

Christian_Dahlqvist · July 3, 2019, 6:11pm

Elasticsearch internally communicates on port 9300, while clients connect to port 9200 via HTTP. The seed hosts therefore look correct.

thev0yager · July 3, 2019, 6:22pm

Okay I changed the port to 9200 and one of the nodes worked the other one now has.

2019-07-03T13:19:36,620][INFO ][o.e.n.Node               ] [choice-node1] initialized
[2019-07-03T13:19:36,620][INFO ][o.e.n.Node               ] [choice-node1] starting ...
[2019-07-03T13:19:36,841][INFO ][o.e.t.TransportService   ] [choice-node1] publish_address {10.101.0.140:9300}, bound_addresses {10.101.0.140:9300}
[2019-07-03T13:19:36,877][INFO ][o.e.b.BootstrapChecks    ] [choice-node1] bound or publishing to a non-loopback address, enforcing bootstrap checks
[2019-07-03T13:19:36,912][INFO ][o.e.c.c.Coordinator      ] [choice-node1] cluster UUID [TUPCVI9iQBK5p3Erp3L5ew]
[2019-07-03T13:19:46,936][WARN ][o.e.c.c.ClusterFormationFailureHelper] [choice-node1] master not discovered yet: have discovered []; discovery will continue using [10.101.0.140:9200, 10.101.0.141:9200] from hosts providers and [{choice-node1}{ukRS6dHuTiWJVjVvpSIKBQ}{5Rl6IYxaQkaeT1KAh27QZQ}{10.101.0.140}{10.101.0.140:9300}{ml.machine_memory=16820174848, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 52, last-accepted version 96 in term 52
[2019-07-03T13:19:54,950][WARN ][o.e.t.TcpTransport       ] [choice-node1] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.101.0.140:59916, remoteAddress=/10.101.0.141:9200}], closing connection
io.netty.handler.codec.DecoderException: java.io.StreamCorruptedException: invalid internal transport message format, got (48,54,54,50)

thev0yager · July 3, 2019, 6:23pm

This is part of the log I got the other part.

Caused by: java.io.StreamCorruptedException: invalid internal transport message format, got (48,54,54,50)
	at org.elasticsearch.transport.TcpTransport.readHeaderBuffer(TcpTransport.java:841) ~[elasticsearch-7.1.1.jar:7.1.1]
	at org.elasticsearch.transport.TcpTransport.readMessageLength(TcpTransport.java:827) ~[elasticsearch-7.1.1.jar:7.1.1]
	at org.elasticsearch.transport.netty4.Netty4SizeHeaderFrameDecoder.decode(Netty4SizeHeaderFrameDecoder.java:40) ~[transport-netty4-client-7.1.1.jar:7.1.1]

thev0yager · July 3, 2019, 6:26pm

I feel like this shouldn't work right? The way I am setting this up is way different from what the website says. Should this work?

Christian_Dahlqvist · July 3, 2019, 6:28pm

If you remove ALL port numbers it should use the defaults, which should work. It seems dome nodes still have incorrect port numbers.

thev0yager · July 3, 2019, 6:30pm

Oh heck I changed the the port after I posted the yml my bad.

thev0yager · July 3, 2019, 6:36pm

Should I comment out the "http.port: 9200" line or should that be left in? If so I still got the same error "master not discovered yet"

DavidTurner · July 3, 2019, 7:05pm

Yes, you should not be setting http.port, so delete that line.

Which website? How are you doing something different? Most settings have sensible defaults, so the fewer settings you're changing the better.

I've lost track of what state we're in now that you've fixed the port (and maybe something else). Can you share your current configs and the corresponding logs? From both the nodes please.

thev0yager · July 3, 2019, 7:10pm

Absolutely! I have been looking at the elastic website on how to setup a cluster. I have pieced together what the has told me to make what I have now since what they have suggested is different from what my CTO wants me to do.

Node 2

cluster.name: choice-cluster
#node.name: corp-elk02
node.master: true
node.data: true
network.host: 10.101.0.141
discovery.seed_hosts: ["10.101.0.140","10.101.0.141"]
cluster.initial_master_nodes: ["10.101.0.140","10.101.0.141"]
xpack.security.enabled: false

Node 1

cluster.name: choice-cluster
#node.name: corp-elk01
node.master: true
node.data: true
network.host: 10.101.0.140
discovery.seed_hosts: ["10.101.0.140","10.101.0.141"]
cluster.initial_master_nodes: ["10.101.0.140","10.101.0.141"]
xpack.security.enabled: false

DavidTurner · July 3, 2019, 7:12pm

Ok, not sure why the node.name lines are commented out, but otherwise this looks good. What about the logs? Again, the first couple of minutes after startup from each node please.

thev0yager · July 3, 2019, 7:15pm

[2019-07-03T19:13:40,371][INFO ][o.e.c.r.a.AllocationService] [corp-elk02] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[logstash][0]] ...]).

Okay the cluster is up now to join the other to it

thev0yager · July 3, 2019, 7:16pm

Do you need more logs from the working one?

DavidTurner · July 3, 2019, 7:16pm

Yes, logs from both nodes please.

Topic		Replies	Views
Trouble setting up an elasticsearch cluster Elasticsearch	7	597	November 11, 2019
Elasticsearch 3 nodes cluster not joining with each other Elasticsearch	17	2291	July 30, 2021
Clustering with Elasticsearch issues Elasticsearch	13	1414	July 5, 2017
Deploy Cluster ES with multiple node? Elasticsearch	10	3373	July 5, 2017
Develop ELK Cluster Elasticsearch	10	367	September 30, 2021

Cluster Setup 3 Node Cluster problem

Related topics