Elasticsearch-5.0.2 -> SERVICE_UNAVAILABLE/1/state not recovered / initialized

Hi,

What can I do to bring this cluster of 5 nodes up and running again?

Issue:
Since I upgraded jvm.options settings from -Xms2g -Xmx2g to -Xms4g -Xmx4g the cluster stays in red state.

See details:
Version: elasticsearch-5.0.2

http://node-1:9200/_cluster/health?pretty=true
{
"cluster_name": "cluster-elasticsearch-ops",
"status": "red",
"timed_out": false,
"number_of_nodes": 5,
"number_of_data_nodes": 5,
"active_primary_shards": 0,
"active_shards": 0,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": "NaN"
}

http://node-1:9200/_cat/nodes?v&h=id,ip,port,v,m,hm,hp,fdp,iif
id ip port v m hm hp fdp iif
6LPl 10.146.84.61 9300 5.0.2 - 3.9gb 8 0 0
7NJG 10.146.84.65 9300 5.0.2 * 3.9gb 22 0 0
zbr- 10.146.84.64 9300 5.0.2 - 3.9gb 10 0 0
_TWt 10.146.84.62 9300 5.0.2 - 3.9gb 8 0 0
OLmo 10.146.84.63 9300 5.0.2 - 3.9gb 15 1 0

elasticsearch.yml
#node-1 discovery
discovery.zen.ping.unicast.hosts: ["10.146.84.61", "10.146.84.62","10.146.84.63",'10.146.84.64']

discovery.zen.minimum_master_nodes: 3 

gateway.recover_after_nodes: 3 
gateway.expected_nodes: 5
gateway.recover_after_time: 5m

network connectivity across all nodes is fine. All nodes can see port 9200 and net.ipv4.tcp_keepalive_time = 300 across all nodes.

All nodes start fine but the cluster stays in red state This is a snippet log from the master node.
Any suggestions to fix this issue without initializing the cluster will be appreciated.

 [2018-03-13T01:22:25,143][INFO ][o.e.n.Node               ] [node-5] initializing ...
[2018-03-13T01:22:25,272][INFO ][o.e.e.NodeEnvironment    ] [node-5] using [1] data paths, mounts [[/usr/local (/dev/mapper/datavg-lv_app)]], net usable_space [993.7gb], net total_space [1004.5gb], spins? [possibly], types [xfs]
[2018-03-13T01:22:25,273][INFO ][o.e.e.NodeEnvironment    ] [node-5] heap size [3.9gb], compressed ordinary object pointers [true]
[2018-03-13T01:22:25,368][INFO ][o.e.n.Node               ] [node-5] version[5.0.2], pid[22375], build[f6b4951/2016-11-24T10:07:18.101Z], OS[Linux/3.10.0-229.20.1.el7.x86_64/amd64], JVM[Oracle Corporation/Java HotSpot(TM) 64-Bit Server VM/1.8.0_101/25.101-b13]
[2018-03-13T01:22:26,875][INFO ][o.e.p.PluginsService     ] [node-5] loaded module [aggs-matrix-stats]
[2018-03-13T01:22:26,876][INFO ][o.e.p.PluginsService     ] [node-5] loaded module [ingest-common]
[2018-03-13T01:22:26,876][INFO ][o.e.p.PluginsService     ] [node-5] loaded module [lang-expression]
[2018-03-13T01:22:26,876][INFO ][o.e.p.PluginsService     ] [node-5] loaded module [lang-groovy]
[2018-03-13T01:22:26,876][INFO ][o.e.p.PluginsService     ] [node-5] loaded module [lang-mustache]
[2018-03-13T01:22:26,876][INFO ][o.e.p.PluginsService     ] [node-5] loaded module [lang-painless]
[2018-03-13T01:22:26,877][INFO ][o.e.p.PluginsService     ] [node-5] loaded module [percolator]
[2018-03-13T01:22:26,877][INFO ][o.e.p.PluginsService     ] [node-5] loaded module [reindex]
[2018-03-13T01:22:26,877][INFO ][o.e.p.PluginsService     ] [node-5] loaded module [transport-netty3]
[2018-03-13T01:22:26,877][INFO ][o.e.p.PluginsService     ] [node-5] loaded module [transport-netty4]
[2018-03-13T01:22:26,878][INFO ][o.e.p.PluginsService     ] [node-5] no plugins loaded
[2018-03-13T01:23:23,702][INFO ][o.e.n.Node               ] [node-5] initialized
[2018-03-13T01:23:23,703][INFO ][o.e.n.Node               ] [node-5] starting ...
[2018-03-13T01:23:23,939][INFO ][o.e.t.TransportService   ] [node-5] publish_address {10.146.84.65:9300}, bound_addresses {10.146.84.65:9300}
[2018-03-13T01:23:23,948][INFO ][o.e.b.BootstrapCheck     ] [node-5] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
[2018-03-13T01:23:30,868][INFO ][o.e.c.s.ClusterService   ] [node-5] new_master {node-5}{7NJGnmBDTD2AXkg1INFA4g}{f8iNd2k1S6yUxyFY7zWTmg}{10.146.84.65}{10.146.84.65:9300}, added {{node-4}{zbr-kmXkSWCigqeKeV3NVw}{iowDFglQSwyfRZataPU38Q}{10.146.84.64}{10.146.84.64:9300},{node-3}{OLmoxSQeSHq7Hnp-oCy4oQ}{nxx4L3vRR1Oiz5gZ30yCbg}{10.146.84.63}{10.146.84.63:9300},}, reason: zen-disco-elected-as-master ([2] nodes joined)[{node-4}{zbr-kmXkSWCigqeKeV3NVw}{iowDFglQSwyfRZataPU38Q}{10.146.84.64}{10.146.84.64:9300}, {node-3}{OLmoxSQeSHq7Hnp-oCy4oQ}{nxx4L3vRR1Oiz5gZ30yCbg}{10.146.84.63}{10.146.84.63:9300}]
[2018-03-13T01:23:30,924][INFO ][o.e.g.GatewayService     ] [node-5] delaying initial state recovery for [5m]. expecting [5] nodes, but only have [3]
[2018-03-13T01:23:30,944][INFO ][o.e.h.HttpServer         ] [node-5] publish_address {10.146.84.65:9200}, bound_addresses {10.146.84.65:9200}
[2018-03-13T01:23:30,945][INFO ][o.e.n.Node               ] [node-5] started
[2018-03-13T01:23:31,596][INFO ][o.e.c.s.ClusterService   ] [node-5] added {{node-2}{_TWtxmg1QWmkQLlcoJKbfA}{9exE8h4MRcGdz5IDsvgGnA}{10.146.84.62}{10.146.84.62:9300},}, reason: zen-disco-node-join[{node-2}{_TWtxmg1QWmkQLlcoJKbfA}{9exE8h4MRcGdz5IDsvgGnA}{10.146.84.62}{10.146.84.62:9300}]
[2018-03-13T01:23:38,771][INFO ][o.e.c.s.ClusterService   ] [node-5] added {{node-1}{6LPl_eO5TrGpoySO7xTYAA}{68M1OsbZQqGa4YTmdRiC8g}{10.146.84.61}{10.146.84.61:9300},}, reason: zen-disco-node-join[{node-1}{6LPl_eO5TrGpoySO7xTYAA}{68M1OsbZQqGa4YTmdRiC8g}{10.146.84.61}{10.146.84.61:9300}]
[2018-03-13T01:23:57,021][INFO ][o.e.m.j.JvmGcMonitorService] [node-5] [gc][33] overhead, spent [485ms] collecting in the last [1.2s]
[2018-03-13T01:23:58,025][INFO ][o.e.m.j.JvmGcMonitorService] [node-5] [gc][34] overhead, spent [406ms] collecting in the last [1s]
[2018-03-13T01:23:59,028][INFO ][o.e.m.j.JvmGcMonitorService] [node-5] [gc][35] overhead, spent [318ms] collecting in the last [1s]

##Log event Issue:
[2018-03-13T01:24:31,300][DEBUG][o.e.a.a.i.c.TransportCreateIndexAction] [node-5] timed out while retrying [indices:admin/create] after failure (timeout [1m])
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
	at org.elasticsearch.cluster.block.ClusterBlocks.indexBlockedException(ClusterBlocks.java:178) ~[elasticsearch-5.0.2.jar:5.0.2]

Regards,
Lp

Cluster recovered but it took 3 hours.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.