I have a cluster with two Graylog nodes and three elasticsearch nodes. I have 4 primaries and 1 replica per index.
I have recently been testing resillience by running 'service elasticsearch stop' to stop one of the ES nodes, when I check unassigned shards using: curl -s 'http://localhost:9200/_cat/shards?pretty' | grep UNASSIGNED
it returns a third of the shards being unassigned.
I chose one at random and ran the curl command:
http://localhost:9200/_cluster/allocation/explain?include_yes_decisions=true' -d '{
"index": "graylog_0",
"shard": 3,
"primary": true
}'
I got the following errors:
"index": "graylog_0", "shard": 3, "primary": true }' { "index" : "graylog_0", "shard" : 3, "primary" : true, "current_state" : "unassigned", "unassigned_info" : { "reason" : "NODE_LEFT", "at" : "2018-11-23T13:52:36.979Z", "details" : "node_left[x9ohTd5uRyyK9kHhzIlIzg]", "last_allocation_status" : "no_valid_shard_copy" }, "can_allocate" : "no_valid_shard_copy", "allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster", "node_allocation_decisions" : [ { "node_id" : "D3wazOlhQ3-C36kepICG7w", "node_name" : "es-1", "transport_address" : "10.19.0.6:9300", "node_decision" : "no", "store" : { "found" : false } }, { "node_id" : "LUbM7G9RQL2L-lGxfO7SIQ", "node_name" : "es-2", "transport_address" : "10.19.0.8:9300", "node_decision" : "no", "store" : { "found" : false } } ] }
Surely if a node leaves, the primary shards are moved onto another node or the replica of the said shard is made primary?
Any ideas?
Cheers,
G