ELK 5.4 data node is failing to join the cluster, it is stuck at discovery phase

(chakradhara) #1

I am using Elasticsearch 5.4,
I have 3 master nodes and 8 data notes in which one data node stopped its elastic search process abruptly as its supervisord is down.
I have 111 unassigned shrads and the reason for that is,
"can_allocate" : "no_valid_shard_copy",
** "allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",**

I am thinking the node which stopped should have the primary shard, That node is failing to join the cluster, from the logs I see below,

[2018-05-09T11:39:55,759][INFO ][o.e.n.Node ] [ah-1007168-003] initializing ...
[2018-05-09T11:39:55,812][INFO ][o.e.e.NodeEnvironment ] [ah-1007168-003] using [1] data paths, mounts [[/apps/cie_dashboard (/dev/mapper/volgrp02-appscie_dashboard)]], net usable_space [529.1gb], net total_space [899.5gb], spins? [possibly], types [xfs]
[2018-05-09T11:39:55,813][INFO ][o.e.e.NodeEnvironment ] [ah-1007168-003] heap size [23.9gb], compressed ordinary object pointers [true]
[2018-05-09T11:39:56,141][INFO ][o.e.n.Node ] [ah-1007168-003] node name [ah-1007168-003], node ID [iUAPJ0glTV2fP5_isJE4Jg]
[2018-05-09T11:39:56,141][INFO ][o.e.n.Node ] [ah-1007168-003] version[5.4.0], pid[6077], build[780f8c4/2017-04-28T17:43:27.229Z], OS[Linux/3.10.0-693.17.1.el7.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_72/25.72-b15]
[2018-05-09T11:39:56,914][INFO ][o.e.p.PluginsService ] [ah-1007168-003] loaded module [aggs-matrix-stats]
[2018-05-09T11:39:56,914][INFO ][o.e.p.PluginsService ] [ah-1007168-003] loaded module [ingest-common]
[2018-05-09T11:39:56,915][INFO ][o.e.p.PluginsService ] [ah-1007168-003] loaded module [lang-expression]
[2018-05-09T11:39:56,915][INFO ][o.e.p.PluginsService ] [ah-1007168-003] loaded module [lang-groovy]
[2018-05-09T11:39:56,915][INFO ][o.e.p.PluginsService ] [ah-1007168-003] loaded module [lang-mustache]
[2018-05-09T11:39:56,915][INFO ][o.e.p.PluginsService ] [ah-1007168-003] loaded module [lang-painless]
[2018-05-09T11:39:56,915][INFO ][o.e.p.PluginsService ] [ah-1007168-003] loaded module [percolator]
[2018-05-09T11:39:56,915][INFO ][o.e.p.PluginsService ] [ah-1007168-003] loaded module [reindex]
[2018-05-09T11:39:56,915][INFO ][o.e.p.PluginsService ] [ah-1007168-003] loaded module [transport-netty3]
[2018-05-09T11:39:56,915][INFO ][o.e.p.PluginsService ] [ah-1007168-003] loaded module [transport-netty4]
[2018-05-09T11:39:56,916][INFO ][o.e.p.PluginsService ] [ah-1007168-003] no plugins loaded
[2018-05-09T11:39:58,062][INFO ][o.e.d.DiscoveryModule ] [ah-1007168-003] using discovery type [zen]
[2018-05-09T11:39:59,561][INFO ][o.e.n.Node ] [ah-1007168-003] initialized
[2018-05-09T11:39:59,561][INFO ][o.e.n.Node ] [ah-1007168-003] starting ...
[2018-05-09T11:39:59,677][INFO ][o.e.t.TransportService ] [ah-1007168-003] publish_address {}, bound_addresses {}
[2018-05-09T11:39:59,683][INFO ][o.e.b.BootstrapChecks ] [ah-1007168-003] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2018-05-09T11:39:59,687][ERROR][o.e.b.Bootstrap ] [ah-1007168-003] node validation exception
bootstrap checks failed
max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2018-05-09T11:39:59,690][INFO ][o.e.n.Node ] [ah-1007168-003] stopping ...
[2018-05-09T11:39:59,713][INFO ][o.e.n.Node ] [ah-1007168-003] stopped
[2018-05-09T11:39:59,713][INFO ][o.e.n.Node ] [ah-1007168-003] closing ...
[2018-05-09T11:39:59,721][INFO ][o.e.n.Node ] [ah-1007168-003] closed

My /etc/security/limits.conf values are already set to,
root - nofile 100000
ahselkpd - nofile 82919
ahselkpd - memlock unlimited # #unlimited memory lock for elk_user

my ulimit -n for ahselkpd is 82919
my ulimit -n for root is 100000

Also the value for vm.max_map_count=262144 is set globally under /etc/sysctl.conf

Please help how do I resolve this issue?

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.