3 nodes ES 2.3.2 cluster with Replica 2 goes to red state after bringing down whole cluster and starting only a single node


(SK) #1

ES version ES 2.3.2
JRE version":"1.8.0_112
OS: Windows server 2012

I have a cluster with 3 nodes. I have a index with 2 replicas. I brought down all the es nodes. However when i restart one es node, the shards are unassigned and the index/cluster is in red state. if i bring up one more node then my cluster becomes yellow and operational.

The issue here is even though my replica is 2 for a 3 node cluster and after full cluster shutdown and a single node startup, the cluster/index should have been yellow than red. Am I doing anything wrong here ??

A similar use case with 2 nodes and replica as 1 works fine.

_cat/indices
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open twitter 5 2 37 0 199.7kb 66.5kb

_cat/nodes
host ip heap.percent ram.percent load node.role master name
10.196.18.1 10.196.18.1 13 24 -1.00 d m Alpha Ray
10.196.18.1 10.196.18.1 5 24 -1.00 d * Zebediah Killgrave
10.196.18.1 10.196.18.1 14 24 -1.00 d m Chance

_cat/shards
index shard prirep state docs store ip node
twitter 3 p STARTED 9 20kb 10.196.18.1 Zebediah Killgrave
twitter 3 r STARTED 9 17.2kb 10.196.18.1 Chance
twitter 3 r STARTED 9 20kb 10.196.18.1 Alpha Ray
twitter 1 r STARTED 11 14.6kb 10.196.18.1 Zebediah Killgrave
twitter 1 p STARTED 11 14.6kb 10.196.18.1 Chance
twitter 1 r STARTED 11 17.4kb 10.196.18.1 Alpha Ray
twitter 2 r STARTED 8 14.4kb 10.196.18.1 Zebediah Killgrave
twitter 2 r STARTED 8 14.4kb 10.196.18.1 Chance
twitter 2 p STARTED 8 14.4kb 10.196.18.1 Alpha Ray
twitter 4 r STARTED 7 11.6kb 10.196.18.1 Zebediah Killgrave
twitter 4 p STARTED 7 11.6kb 10.196.18.1 Chance
twitter 4 r STARTED 7 11.6kb 10.196.18.1 Alpha Ray
twitter 0 p STARTED 2 5.7kb 10.196.18.1 Zebediah Killgrave
twitter 0 r STARTED 2 5.7kb 10.196.18.1 Chance
twitter 0 r STARTED 2 5.7kb 10.196.18.1 Alpha Ray

Now all the nodes are shut down and i just bring up one node output as shown below. As you see in the below output i dont see recovery of index and change of status from red to yellow like example
"cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[twitter]"

[2017-05-25 04:21:21,767][INFO ][plugins ] [Leo] modules [reind
ex], plugins [analysis-phonetic, delete-by-query], sites []
[2017-05-25 04:21:21,812][INFO ][env ] [Leo] using [1] data
paths, mounts [[(C:)]], net usable_space [150.4gb], net total_space [239.6gb],
spins? [unknown], types [NTFS]
[2017-05-25 04:21:21,813][INFO ][env ] [Leo] heap size [910
.5mb], compressed ordinary object pointers [true]
[2017-05-25 04:21:25,449][INFO ][node ] [Leo] initialized
[2017-05-25 04:21:25,449][INFO ][node ] [Leo] starting ...
[2017-05-25 04:21:25,696][INFO ][transport ] [Leo] publish_addres
s {slc12acw.us.oracle.com/10.196.18.1:9300}, bound_addresses {10.196.18.1:9300},
{[fe80::9813:c9a8:3ce6:a365]:9300}
[2017-05-25 04:21:25,709][INFO ][discovery ] [Leo] sigcluster1/GQ
NpeJo2QlyuI75sZDvMPQ
[2017-05-25 04:21:28,784][INFO ][cluster.service ] [Leo] new_master {Le
o}{GQNpeJo2QlyuI75sZDvMPQ}{10.196.18.1}{slc12acw.us.oracle.com/10.196.18.1:9300}
, reason: zen-disco-join(elected_as_master, [0] joins received)
[2017-05-25 04:21:28,817][INFO ][http ] [Leo] publish_addres
s {slc12acw.us.oracle.com/10.196.18.1:9200}, bound_addresses {10.196.18.1:9200},
{[fe80::9813:c9a8:3ce6:a365]:9200}
[2017-05-25 04:21:28,818][INFO ][node ] [Leo] started
[2017-05-25 04:21:28,947][INFO ][gateway ] [Leo] recovered [1]
indices into cluster_state

_cat/shards with only single node restart after complete cluster shutdown
index shard prirep state docs store ip node
twitter 3 p UNASSIGNED
twitter 3 r UNASSIGNED
twitter 3 r UNASSIGNED
twitter 1 p UNASSIGNED
twitter 1 r UNASSIGNED
twitter 1 r UNASSIGNED
twitter 2 p UNASSIGNED
twitter 2 r UNASSIGNED
twitter 2 r UNASSIGNED
twitter 4 p UNASSIGNED
twitter 4 r UNASSIGNED
twitter 4 r UNASSIGNED
twitter 0 p UNASSIGNED
twitter 0 r UNASSIGNED
twitter 0 r UNASSIGNED


(Colin Goodheart-Smithe) #2

Can you use the Allocation Explain API on one of the primary shards and paste the response in a gist and link it here? It should give some indication of why the shards remain unassigned. The request would be something like:

GET /_cluster/allocation/explain
{
  "index": "twitter",
  "shard": 0,
  "primary": true
}

(SK) #3

Allocation API is available in 5.x right ?? I am using 2.3.2..


(Colin Goodheart-Smithe) #4

Yes, you are right that API is not available sorry. However it looks like this is expected in 2.x as per https://github.com/elastic/elasticsearch/issues/24887#issuecomment-303992081


(SK) #5

Thanks for helping me in this


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.