Elasticsearch cluster problems

Hi there,
I want to create elasticsearch cluster, but encountered some problems...

my os version: ubuntu 18.04
my elasticsearch version: 7.7
my java version:
openjdk version "11.0.7" 2020-04-14
OpenJDK Runtime Environment (build 11.0.7+10-post-Ubuntu-2ubuntu218.04)
OpenJDK 64-Bit Server VM (build 11.0.7+10-post-Ubuntu-2ubuntu218.04, mixed mode, sharing)

I have three VM, below are three elasticsearch.yml configuration

master ES:

cluster.name: elastiflow-cluster
node.name: cluster-1
node.master: true
node.data: false
node.ingest: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 10.250.31.43
http.port: 9200
discovery.seed_hosts:
- 10.250.31.43:9200
- 10.250.31.44:9401
- 10.250.31.45:9301
cluster.initial_master_nodes:
- cluster-1

data ES:

cluster.name: elastiflow-cluster
node.name: data-1
node.master: false
node.data: true
node.ingest: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 10.250.31.44
http.port: 9401
discovery.seed_hosts:
- 10.250.31.43:9200
- 10.250.31.44:9401
- 10.250.31.45:9301
cluster.initial_master_nodes:
- cluster-1

ingest ES:

cluster.name: elastiflow-cluster
node.name: ingest-1
node.master: false
node.data: false
node.ingest: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 10.250.31.45
http.port: 9301
discovery.seed_hosts:
- 10.250.31.43:9200
- 10.250.31.44:9401
- 10.250.31.45:9301
cluster.initial_master_nodes:
- cluster-1

All these three VM can start elasticsearch service successfully, but they didn't combine each other, also, in data ES log and ingest ES log I saw some console

data ES log:
[2020-06-11T03:20:35,560][WARN ][o.e.c.c.ClusterFormationFailureHelper] [data-1] master not discovered yet: have discovered [{data-1}{XnHMvhVqSliWgW1kA17j6A}{U1Bha4kATvuFppFPlIasJQ}{10.250.31.44}{10.250.31.44:9300}{dlrt}{ml.machine_memory=16773595136, xpack.installed=true, transform.node=true, ml.max_open_jobs=20}]; discovery will continue using [10.250.31.43:9200, 10.250.31.44:9401, 10.250.31.45:9301] from hosts providers and [] from last-known cluster state; node term 0, last-accepted version 0 in term 0

ingest ES log:

[2020-06-11T03:30:56,462][WARN ][o.e.t.TcpTransport       ] [ingest-1] exception caught on transport layer [Netty4TcpChannel{localAddress=/10.250.31.45:59782, remoteAddress=/10.250.31.44:9401}], closing connection
io.netty.handler.codec.DecoderException: java.io.StreamCorruptedException: received HTTP response on transport port, ensure that transport port (not HTTP port) of a remote node is specified in the configuration
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) ~[netty-codec-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:401) ~[netty-codec-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:368) ~[netty-codec-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:351) ~[netty-codec-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:260) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:246) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:239) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.handler.logging.LoggingHandler.channelInactive(LoggingHandler.java:153) [netty-handler-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:260) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:246) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:239) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:260) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:246) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:818) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) [netty-common-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) [netty-common-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:497) [netty-transport-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-common-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.45.Final.jar:4.1.45.Final]
	at java.lang.Thread.run(Thread.java:832) [?:?]
Caused by: java.io.StreamCorruptedException: received HTTP response on transport port, ensure that transport port (not HTTP port) of a remote node is specified in the configuration
	at org.elasticsearch.transport.TcpTransport.readHeaderBuffer(TcpTransport.java:758) ~[elasticsearch-7.7.1.jar:7.7.1]
	at org.elasticsearch.transport.TcpTransport.readMessageLength(TcpTransport.java:747) ~[elasticsearch-7.7.1.jar:7.7.1]
	at org.elasticsearch.transport.netty4.Netty4SizeHeaderFrameDecoder.decode(Netty4SizeHeaderFrameDecoder.java:43) ~[transport-netty4-client-7.7.1.jar:7.7.1]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:498) ~[netty-codec-4.1.45.Final.jar:4.1.45.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:437) ~[netty-codec-4.1.45.Final.jar:4.1.45.Final]
	... 21 more

Would you please help me to solve these problems?
Thanks
Kase

Some comments:

  • on a 3 node clusters, make all nodes master eligible. Remove: node.master: false.
  • avoid using 93xx as the rest port. It's normally used by the transport layer. If you do so, you also need to change transport.port to make sure it does not conflict.
  • try as much as possible to use the default settings. Why changing the ports as you are running on different machines?
  • the discovery.seed_hosts must use the transport port (default 9300) not the rest port

HTH

Hi @dadoonet ,
thanks to your advices, I changed all port to 9200, and remove node.master: false.
but problem still not solve, console is same..
I want to ask why I have to do first thing?
I use node.master: false because I want to test different node type.

Separate point - initial_master_nodes should not be set on master-ineligible nodes

Source: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-bootstrap-cluster.html

Thanks @dadoonet, and @Hubert_Pham

I have solved these problems, I check all ES found that only master node has uuid, others uuid is _na_
so I go to /var/lib/elasticsearch , remove all files, and fix all elasticsearch.yml,
discovery.seed_hosts:
- 10.250.31.43:9200
- 10.250.31.44:9200
- 10.250.31.45:9200

to

discovery.seed_hosts:
- 10.250.31.43
- 10.250.31.44
- 10.250.31.45

and restart data node and ingest node, finally works!!
thanks for yours helping!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.