Elasticsearch cluster is in Red state. How to recover it?

Hi,
I am using Elasticsearch 6.6.1 in k8s environment. My cluster was in green state before. But now my cluster is in red state due to UNASSIGNED shards. I see many shards are in PRIMARY_FAILED state.

You can find the details below.
Response from primary shard:
curl -X GET "http://xx.xx.xx.xx:9200/_cluster/allocation/explain?pretty" -H 'Content-Type: application/json' -d'
{
"index": "log-2019-10-08",
"shard": 9,
"primary": true
}
'
{
"index" : "log-2019-10-08",
"shard" : 9,
"primary" : true,
"current_state" : "initializing",
"unassigned_info" : {
"reason" : "NODE_LEFT",
"at" : "2019-10-22T09:26:26.488Z",
"details" : "node_left[UVNJBTB8SuC3OiVnaB4Tfw]",
"last_allocation_status" : "awaiting_info"
},
"current_node" : {
"id" : "UVNJBTB8SuC3OiVnaB4Tfw",
"name" : "elasticsearch-data-4",
"transport_address" : "xx.xx.xx.xx:9300"
},
"explanation" : "the shard is in the process of initializing on node [elasticsearch-data-4], wait until initialization has completed"
}

The replica response is below.
curl -X GET "http://xx.xx.xx.xx:9200/_cluster/allocation/explain?pretty" -H 'Content-Type: application/json' -d'
{
"index": "log-2019-10-08",
"shard": 9,
"primary": false
}
'
{
"index" : "log-2019-10-08",
"shard" : 9,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "PRIMARY_FAILED",
"at" : "2019-10-21T20:06:43.919Z",
"details" : "primary failed while replica initializing",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "Ljab6IuXQTOUNxk8RkcuGg",
"node_name" : "elasticsearch-data-1",
"transport_address" : "xx.xx.xx.xx:9300",
"node_decision" : "no",
"deciders" : [
{
"decider" : "replica_after_primary_active",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
},
{
"decider" : "throttling",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
}
]
},
{
"node_id" : "UVNJBTB8SuC3OiVnaB4Tfw",
"node_name" : "elasticsearch-data-4",
"transport_address" : "xx.xx.xx.xx:9300",
"node_decision" : "no",
"deciders" : [
{
"decider" : "replica_after_primary_active",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
},
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[log-2019-10-08][9], node[UVNJBTB8SuC3OiVnaB4Tfw], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[INITIALIZING], a[id=clqmzyGgSQC4HBKolutV-Q], unassigned_info[[reason=NODE_LEFT], at[2019-10-22T09:26:26.488Z], delayed=false, details[node_left[UVNJBTB8SuC3OiVnaB4Tfw]], allocation_status[fetching_shard_data]]]"
},
{
"decider" : "throttling",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
}
]
},
{
"node_id" : "V4qtWbtLRqyDqW9f6T0mog",
"node_name" : "elasticsearch-data-2",
"transport_address" : "xx.xx.xx.xx:9300",
"node_decision" : "no",
"deciders" : [
{
"decider" : "replica_after_primary_active",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
},
{
"decider" : "throttling",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
}
]
},
{
"node_id" : "r6UCUEPzR6aY0Kz8NiauDg",
"node_name" : "elasticsearch-data-0",
"transport_address" : "xx.xx.xx.xx:9300",
"node_decision" : "no",
"deciders" : [
{
"decider" : "replica_after_primary_active",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
},
{
"decider" : "throttling",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
}
]
},
{
"node_id" : "sNmtl-VvQMqS2bcXEycB-g",
"node_name" : "elasticsearch-data-3",
"transport_address" : "xx.xx.xx.xx:9300",
"node_decision" : "no",
"deciders" : [
{
"decider" : "replica_after_primary_active",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
},
{
"decider" : "throttling",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
}
]
}
]
}
How can I bring back my cluster to healthy state?

What is the full output of the cluster health API?

Response from cluster health API is below:
curl -XGET xx.xx.xx.xx:9200/_cluster/health?pretty
{
"cluster_name" : "my-cluster-1",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 11,
"number_of_data_nodes" : 6,
"active_primary_shards" : 494,
"active_shards" : 716,
"relocating_shards" : 0,
"initializing_shards" : 449,
"unassigned_shards" : 401,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 20,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 6691643,
"active_shards_percent_as_number" : 45.721583652618136
}

How did you end up in this state? Are all original data nodes now part of the cluster?

Hi,
Two of the worker node had gone down in few hours time difference. When we recovered the worker nodes the issue started appearing in elasticsearch cluster.
All the original data nodes are now the part of cluster.

The primary shard you looked at above is recovering:

This means you should just wait and eventually it will recover.

This suggests you have increased a setting such as cluster.routing.allocation.node_concurrent_recoveries far too high. Your cluster may be deadlocked. Could you set it (and any other related settings) back to the default and perform a full cluster restart?

But the primary shards are in intializing state since 4-5days.

Hi,
I am using k8s environment. What do you mean by full cluster restart? How can I restart my ES cluster?

I don't know about Kubernetes specifically, but a full cluster restart is where you shut all of the nodes down and then start them all up again.

Hi,
On what all conditions/scenarios can cluster.routing.allocation.node_concurrent_recoveries be tuned to other values than default values?

I would only use this parameter for experiments in a test environment. I would not recommend adjusting it from the default in a production environment.

Have you tried the reroute API?

POST /_cluster/reroute?retry_failed=true

The cluster will attempt to allocate a shard a maximum of index.allocation.max_retries times in a row (defaults to 5 ), before giving up and leaving the shard unallocated. This scenario can be caused by structural problems such as having an analyzer which refers to a stopwords file which doesn’t exist on all nodes.

Once the problem has been corrected, allocation can be manually retried by calling the reroute API with the ?retry_failed URI query parameter, which will attempt a single retry round for these shards.

https://www.elastic.co/guide/en/elasticsearch/reference/6.6/cluster-reroute.html#_retrying_failed_allocations

Hi,
I have set back the cluster.routing.allocation.node_concurrent_recoveries to default value and performed a full cluster restart. And also i have performed reroute API with ?retry_failed.
Now I am able to reduce the number of shards which are in red state. So after performing this i have 66 unassigned shards.
Now I have many indices with some shards are in red state. Since i have recovered some of the shards of that index, I do not want to delete the full index in red state to bring back my cluster to healthy state.
So how can I delete particular red shard of an index without deleting the whole index?