Node discovery issue post upgrade to newer version of ELK stack

Vivek_Samaga · August 21, 2017, 9:56am

I have migrated all my ELK nodes to new versions. I have 2 ES nodes with configs as below
10.***.235
elk_es_crt_00
cluster.name: csm_elk_es_crt_00
node.name: "elkescrt00.***.net"
bootstrap.memory_lock: false
network.host: 192.168.0.1
transport.host: 127.0.0.1
http.port: 9200
http.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: ["elkescrt00.***.net", "elkescrt01.***.net"]

10.***.234
elk_es_crt_01
cluster.name: csm_elk_es_crt_01
node.name: "elkescrt01.***.net"
bootstrap.memory_lock: false
network.host: 192.168.0.1
transport.host: 127.0.0.1
http.port: 9200
http.host: 0.0.0.0
discovery.zen.ping.unicast.hosts: ["elkescrt00.***.net", "elkescrt01.***.net"]

and my kibana.yml has below config;s
kibana.yml

server.port: 5601
server.host: "10.***.232"
elasticsearch.url: "http://elkescrt01.***.net:9200"

after migration ES indices in elk_es_crt_01 has been hashed but not elk_es_crt_00.

am i missing something?

Christian_Dahlqvist · August 21, 2017, 10:07am

The configuration does not make sense to me. It looks like the nodes reside in different IP addresses and based on discovery.zen.ping.unicast.hosts it also look like you are expecting the nodes to form a cluster. This will however not be possible as the two nodes do not have the same cluster.name and transport.host only binds to localhost. What are you expecting the setup to look like? How did it work before the migration? Which version of Elasticsearch are you using?

Vivek_Samaga · August 21, 2017, 10:44am

Yes both nodes reside in different IP.
This was a cluster with 2 ES nodes before migration. you are correct am trying to form a cluster as before.
My ES version is 5.5.1.

Now i did bind the transport.host with localhost.

when i do http://elkcrt.***.com:9200/_cat/nodes

its showing only
127.0.0.1 21 97 7 0.11 0.17 0.12 mdi * elkescrt00.***.net
and GET /_cat/nodes

127.0.0.1 11 96 1 0.00 0.11 0.20 mdi * elkescrt01.***.net

Christian_Dahlqvist · August 21, 2017, 10:50am

In order to form a cluster the nodes need to:

Have the same cluster name
Be able to communicate via the transport protocol on port 9300

In your config the cluster name is different and the nodes can not reach each others transport port as it binds to localhost. You will need to change this in order for your cluster to form a cluster.

You should also set discovery.zen.minimum_master_nodes to 2 once you establish a cluster if both your nodes are master eligible.

Vivek_Samaga · August 22, 2017, 10:10am

So should i mention the port number 9300 explicitly?
i changed the cluster names to be same and i think it was success and in logs i got the below

[o.e.c.s.ClusterService   ] [elkescrt00.***.net] new_master {elkescrt00.***.net}{qgbCeUyrQtSkrL9ciZcnlQ}{6o3tCU2tRg6QV95dnJrwBA}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2017-08-22T04:55:17,735][INFO ][o.e.h.n.Netty4HttpServerTransport] [elkescrt00.***.net] publish_address {10.***.235:9200}, bound_addresses {10.***.235:9200}

But the problem is when i check the nodes using

http://elkcrt.***.com:9200/_cat/nodes
it is giving me
127.0.0.1 72 98 8 0.42 0.99 0.99 mdi * elkescrt01.***.net

only 1 node.
I don't understand.

Christian_Dahlqvist · August 22, 2017, 10:17am

Your node is still binding to 127.0.0.1, which will not work. You need to bind to an IP address (10.***.234 and 10.***.235) that can be reached from the other node.

Vivek_Samaga · August 22, 2017, 10:54am

Thanks. So it should be

network.host: 10.***.235 ?

If so am getting the error

elasticsearch dead but subsys locked

Vivek_Samaga · August 23, 2017, 10:12am

Have been getting this error.
Not able to validate what it is.

I changed my es config to bind host IP with transport port but es service dies everytime.

10.***.235
elk_es_crt_00
cluster.name: csm_elk_es_crt_01
node.name: "elkescrt00.***.net"
bootstrap.memory_lock: true
http.port: 9200
discovery.zen.ping.unicast.hosts: ["elkescrt00.***.net", "elkescrt01.***.net"]

10.***.234
elk_es_crt_01
cluster.name: csm_elk_es_crt_01
node.name: "elkescrt01.***.net"
bootstrap.memory_lock: true
http.port: 9200
discovery.zen.ping.unicast.hosts: ["elkescrt00.***.net", "elkescrt01.***.net"]

Error in logs

[2017-08-23T05:07:17,732][INFO ][o.e.t.TransportService   ] [elkescrt00.***.net] publish_address {10.162.119.235:9300}, bound_addresses {10.162.119.235:9300}
[2017-08-23T05:07:17,746][INFO ][o.e.b.BootstrapChecks    ] [elkescrt00.***.net] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-08-23T05:07:17,748][ERROR][o.e.b.Bootstrap          ] [elkescrt00.***.net] node validation exception
[2] bootstrap checks failed
[1]: max number of threads [1024] for user [elasticsearch] is too low, increase to at least [2048]
[2]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk
[2017-08-23T05:07:17,751][INFO ][o.e.n.Node               ] [elkescrt00.***.net] stopping ...
[2017-08-23T05:07:17,797][INFO ][o.e.n.Node               ] [elkescrt00.***.net] stopped
[2017-08-23T05:07:17,797][INFO ][o.e.n.Node               ] [elkescrt00.***.net] closing ...
[2017-08-23T05:07:17,809][INFO ][o.e.n.Node               ] [elkescrt00.***.net] closed

i really need to fix this soon.
Can somebody help.

Christian_Dahlqvist · August 23, 2017, 11:29am

In addition to what you have here, you need to configure network.host to a non-loopback interface, e.g. the IP of the host. Once you do this, Elasticsearch will determine that you are no longer running a single instance locally for development, and will add additional checks to ensure the deployment is sound and follow best practices. There is system configuration you will need to pay attention to and also a number of bootstrap checks that will need to pass. Information about any checks that fail will be written to the log.

Vivek_Samaga · August 23, 2017, 11:39am

Thanks for the quick reply.
I have same indices in both nodes and having same logs writing on both nodes.
How do i know i have a cluster for my 2 node ES set up?
Is this the command to know number of ES nodes in my cluster?

GET /_nodes

Christian_Dahlqvist · August 23, 2017, 11:42am

The cat nodes API will show the nodes in the cluster.

Vivek_Samaga · August 23, 2017, 11:54am

the problem is as soon as i add non-loopback interface for ex
network.host: 10.162.119.235

and restart the ES service it gives me
elasticsearch dead but subsys locked
error.

Christian_Dahlqvist · August 23, 2017, 12:07pm

What is in the Elasticsearch logs?

Vivek_Samaga · August 23, 2017, 12:20pm

Here is the logs from 1 of the 2 nodes[elkescrt01.***.net,elkescrt00.***.net]. Its the same log except for the host name elkescrt01.***.net .

[2017-08-23T06:19:06,719][INFO ][o.e.n.Node               ] [elkescrt01.***.net] initializing ...
[2017-08-23T06:19:06,988][INFO ][o.e.e.NodeEnvironment    ] [elkescrt01.***.net] using [1] data paths, mounts [[/var/lib/elasticsearch (/dev/mapper/elasticsearch-elasticsearch--data)]], net usable_space [1003gb], net total_space [1.1tb], spins? [possibly], types [ext4]
[2017-08-23T06:19:06,988][INFO ][o.e.e.NodeEnvironment    ] [elkescrt01.***.net] heap size [5.9gb], compressed ordinary object pointers [true]
[2017-08-23T06:19:17,348][INFO ][o.e.n.Node               ] [elkescrt01.***.net] node name [elkescrt01.***.net], node ID [Gn5tyVUAQKmetNeJGyjSFg]
[2017-08-23T06:19:17,348][INFO ][o.e.n.Node               ] [elkescrt01.***.net] version[5.5.1], pid[5210], build[19c13d0/2017-07-18T20:44:24.823Z], OS[Linux/2.6.32-573.18.1.el6.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_71/25.71-b15]
[2017-08-23T06:19:17,354][INFO ][o.e.n.Node               ] [elkescrt01.***.net] JVM arguments [-Xms6g, -Xmx6g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/usr/share/elasticsearch]
[2017-08-23T06:19:19,299][INFO ][o.e.p.PluginsService     ] [elkescrt01.***.net] loaded module [aggs-matrix-stats]
[2017-08-23T06:19:19,299][INFO ][o.e.p.PluginsService     ] [elkescrt01.***.net] loaded module [ingest-common]
[2017-08-23T06:19:19,299][INFO ][o.e.p.PluginsService     ] [elkescrt01.***.net] loaded module [lang-expression]
[2017-08-23T06:19:19,300][INFO ][o.e.p.PluginsService     ] [elkescrt01.***.net] loaded module [lang-groovy]
[2017-08-23T06:19:19,300][INFO ][o.e.p.PluginsService     ] [elkescrt01.***.net] loaded module [lang-mustache]
[2017-08-23T06:19:19,300][INFO ][o.e.p.PluginsService     ] [elkescrt01.***.net] loaded module [lang-painless]
[2017-08-23T06:19:19,300][INFO ][o.e.p.PluginsService     ] [elkescrt01.***.net] loaded module [parent-join]
[2017-08-23T06:19:19,300][INFO ][o.e.p.PluginsService     ] [elkescrt01.***.net] loaded module [percolator]
[2017-08-23T06:19:19,300][INFO ][o.e.p.PluginsService     ] [elkescrt01.***.net] loaded module [reindex]
[2017-08-23T06:19:19,300][INFO ][o.e.p.PluginsService     ] [elkescrt01.***.net] loaded module [transport-netty3]
[2017-08-23T06:19:19,300][INFO ][o.e.p.PluginsService     ] [elkescrt01.***.net] loaded module [transport-netty4]
[2017-08-23T06:19:19,301][INFO ][o.e.p.PluginsService     ] [elkescrt01.***.net] no plugins loaded
[2017-08-23T06:19:22,653][INFO ][o.e.d.DiscoveryModule    ] [elkescrt01.***.net] using discovery type [zen]
[2017-08-23T06:19:27,048][INFO ][o.e.n.Node               ] [elkescrt01.***.net] initialized
[2017-08-23T06:19:27,048][INFO ][o.e.n.Node               ] [elkescrt01.***.net] starting ...
[2017-08-23T07:16:43,879][INFO ][o.e.t.TransportService   ] [elkescrt01.***.net] publish_address {10.162.119.234:9300}, bound_addresses {10.162.119.234:9300}
[2017-08-23T07:16:43,908][INFO ][o.e.b.BootstrapChecks    ] [elkescrt01.***.net] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2017-08-23T07:16:43,911][ERROR][o.e.b.Bootstrap          ] [elkescrt01.***.net] node validation exception
[1] bootstrap checks failed
[1]: max number of threads [1024] for user [elasticsearch] is too low, increase to at least [2048]
[2017-08-23T07:16:43,915][INFO ][o.e.n.Node               ] [elkescrt01.***.net] stopping ...
[2017-08-23T07:16:43,974][INFO ][o.e.n.Node               ] [elkescrt01.***.net] stopped
[2017-08-23T07:16:43,974][INFO ][o.e.n.Node               ] [elkescrt01.***.net] closing ...
[2017-08-23T07:16:43,991][INFO ][o.e.n.Node               ] [elkescrt01.***.net] closed

Christian_Dahlqvist · August 23, 2017, 12:23pm

As you can see from the logs, the bootstrap checks are not passing, preventing the node from starting up. Go through the full list of bootstrap checks and make sure they are all correctly configured.

Vivek_Samaga · August 23, 2017, 12:28pm

I have done this @Christian_Dahlqvist and did set the ulimit -u to 2048 as root user.
But this message keeps popping.

Christian_Dahlqvist · August 23, 2017, 12:39pm

If you have installed from a package, Elasticsearch should be running as the elasticsearch user, not root.

Vivek_Samaga · August 23, 2017, 1:52pm

can't thank enough

system · September 20, 2017, 1:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Getting issue while configuring elasticsearch in clustered mode Elasticsearch	3	996	March 23, 2017
Elasticsearch: Configuring nodes(Network Computers) for single cluster Elasticsearch	1	584	July 5, 2017
ElasticSearch 5.4 Nodes unable to join cluster - Troubleshooting Elasticsearch	5	7360	June 17, 2017
Installing ELK7 - Configuration changes Elasticsearch	4	347	August 28, 2019
Clustering with Elasticsearch issues Elasticsearch	13	1416	July 5, 2017

Node discovery issue post upgrade to newer version of ELK stack

Related topics