Cluster health goes green to yellow


(Gaurav) #1

my cluster state was green and after restart of service one shard remains UNASSIGNED and status goes to yellow. I have 2 machines with 5 shard and 1 replica set settings. I am using default config with multicast off and uni cast enabled.On both machines I left master/data settings unchanged so by default both can switch to master and save data. And on both machines i set 5 shards and 1 replica set and on unicase i gave [IP,IP:port]

I did rerouting using

for shard in $(curl -XGET http://localhost:9201/_cat/shards | grep UNASSIGNED | awk '{print $2}'); do echo "processing $shard" curl -XPOST 'localhost:9201/_cluster/reroute' -d '{ "commands" : [ { "allocate" : { "index" : "wall", "shard" : '$shard', "node" : "node1", "allow_primary" : false } } ] }' sleep 5 done

Which gives following output

{"acknowledged":true,"state":{"version":48,"master_node":"Ar7UpWUQSpSlYcje-u6bgA","blocks":{},"nodes":{"EtQ9mOrLQbiUbHGqeQgMvQ":{"name":"node2","transport_address":"inet[/XXX.XXX.XX.XXX:9300]","attributes":{}},"Ar7UpWUQSpSlYcje-u6bgA":{"name":"node1","transport_address":"inet[/XXX.XXX.XX.XXX:9301]","attributes":{}}},"routing_table":{"indices":{"wall":{"shards":{"2":[{"state":"STARTED","primary":false,"node":"EtQ9mOrLQbiUbHGqeQgMvQ","relocating_node":null,"shard":2,"index":"wall"},{"state":"STARTED","primary":true,"node":"Ar7UpWUQSpSlYcje-u6bgA","relocating_node":null,"shard":2,"index":"wall"}],"0":[{"state":"STARTED","primary":true,"node":"EtQ9mOrLQbiUbHGqeQgMvQ","relocating_node":null,"shard":0,"index":"wall"},{"state":"INITIALIZING","primary":false,"node":"Ar7UpWUQSpSlYcje-u6bgA","relocating_node":null,"shard":0,"index":"wall"}],"3":[{"state":"STARTED","primary":false,"node":"EtQ9mOrLQbiUbHGqeQgMvQ","relocating_node":null,"shard":3,"index":"wall"},{"state":"STARTED","primary":true,"node":"Ar7UpWUQSpSlYcje-u6bgA","relocating_node":null,"shard":3,"index":"wall"}],"1":[{"state":"STARTED","primary":false,"node":"EtQ9mOrLQbiUbHGqeQgMvQ","relocating_node":null,"shard":1,"index":"wall"},{"state":"STARTED","primary":true,"node":"Ar7UpWUQSpSlYcje-u6bgA","relocating_node":null,"shard":1,"index":"wall"}],"4":[{"state":"STARTED","primary":false,"node":"EtQ9mOrLQbiUbHGqeQgMvQ","relocating_node":null,"shard":4,"index":"wall"},{"state":"STARTED","primary":true,"node":"Ar7UpWUQSpSlYcje-u6bgA","relocating_node":null,"shard":4,"index":"wall"}]}}}},"routing_nodes":{"unassigned":[],"nodes":{"EtQ9mOrLQbiUbHGqeQgMvQ":[{"state":"STARTED","primary":false,"node":"EtQ9mOrLQbiUbHGqeQgMvQ","relocating_node":null,"shard":2,"index":"wall"},{"state":"STARTED","primary":true,"node":"EtQ9mOrLQbiUbHGqeQgMvQ","relocating_node":null,"shard":0,"index":"wall"},{"state":"STARTED","primary":false,"node":"EtQ9mOrLQbiUbHGqeQgMvQ","relocating_node":null,"shard":3,"index":"wall"},{"state":"STARTED","primary":false,"node":"EtQ9mOrLQbiUbHGqeQgMvQ","relocating_node":null,"shard":1,"index":"wall"},{"state":"STARTED","primary":false,"node":"EtQ9mOrLQbiUbHGqeQgMvQ","relocating_node":null,"shard":4,"index":"wall"}],"Ar7UpWUQSpSlYcje-u6bgA":[{"state":"STARTED","primary":true,"node":"Ar7UpWUQSpSlYcje-u6bgA","relocating_node":null,"shard":2,"index":"wall"},{"state":"INITIALIZING","primary":false,"node":"Ar7UpWUQSpSlYcje-u6bgA","relocating_node":null,"shard":0,"index":"wall"},{"state":"STARTED","primary":true,"node":"Ar7UpWUQSpSlYcje-u6bgA","relocating_node":null,"shard":3,"index":"wall"},{"state":"STARTED","primary":true,"node":"Ar7UpWUQSpSlYcje-u6bgA","relocating_node":null,"shard":1,"index":"wall"},{"state":"STARTED","primary":true,"node":"Ar7UpWUQSpSlYcje-u6bgA","relocating_node":null,"shard":4,"index":"wall"}]}},"allocations":[]}}

But 0th shard is still unassigned and status is yellow.

Any help or suggestions would be greatly appreciated.

Thanks


(Mark Walkom) #2

The quick fix would be to drop replicas for that index and then add them back.


(Dong Hyun Kim) #3

Hello!
I had similar unassigned shard problem last night.
I use two virtual CentOS machine and one of them lost their disk as unmounted. that causes continually yellow or red index status. because free disk size is low and can't allocate shard or replica, everything.
you may see this
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/index-modules-allocation.html


(system) #4