Hi,
I am using Elasticsearch 6.6.1 in k8s environment. My cluster was in green state before. But now my cluster is in red state due to UNASSIGNED shards. I see many shards are in PRIMARY_FAILED state.
You can find the details below.
Response from primary shard:
curl -X GET "http://xx.xx.xx.xx:9200/_cluster/allocation/explain?pretty" -H 'Content-Type: application/json' -d'
{
"index": "log-2019-10-08",
"shard": 9,
"primary": true
}
'
{
"index" : "log-2019-10-08",
"shard" : 9,
"primary" : true,
"current_state" : "initializing",
"unassigned_info" : {
"reason" : "NODE_LEFT",
"at" : "2019-10-22T09:26:26.488Z",
"details" : "node_left[UVNJBTB8SuC3OiVnaB4Tfw]",
"last_allocation_status" : "awaiting_info"
},
"current_node" : {
"id" : "UVNJBTB8SuC3OiVnaB4Tfw",
"name" : "elasticsearch-data-4",
"transport_address" : "xx.xx.xx.xx:9300"
},
"explanation" : "the shard is in the process of initializing on node [elasticsearch-data-4], wait until initialization has completed"
}
The replica response is below.
curl -X GET "http://xx.xx.xx.xx:9200/_cluster/allocation/explain?pretty" -H 'Content-Type: application/json' -d'
{
"index": "log-2019-10-08",
"shard": 9,
"primary": false
}
'
{
"index" : "log-2019-10-08",
"shard" : 9,
"primary" : false,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "PRIMARY_FAILED",
"at" : "2019-10-21T20:06:43.919Z",
"details" : "primary failed while replica initializing",
"last_allocation_status" : "no_attempt"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
"node_allocation_decisions" : [
{
"node_id" : "Ljab6IuXQTOUNxk8RkcuGg",
"node_name" : "elasticsearch-data-1",
"transport_address" : "xx.xx.xx.xx:9300",
"node_decision" : "no",
"deciders" : [
{
"decider" : "replica_after_primary_active",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
},
{
"decider" : "throttling",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
}
]
},
{
"node_id" : "UVNJBTB8SuC3OiVnaB4Tfw",
"node_name" : "elasticsearch-data-4",
"transport_address" : "xx.xx.xx.xx:9300",
"node_decision" : "no",
"deciders" : [
{
"decider" : "replica_after_primary_active",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
},
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[log-2019-10-08][9],
node[UVNJBTB8SuC3OiVnaB4Tfw], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[INITIALIZING], a[id=clqmzyGgSQC4HBKolutV-Q], unassigned_info[[reason=NODE_LEFT], at[2019-10-22T09:26:26.488Z], delayed=false, details[node_left[UVNJBTB8SuC3OiVnaB4Tfw]], allocation_status[fetching_shard_data]]]"
},
{
"decider" : "throttling",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
}
]
},
{
"node_id" : "V4qtWbtLRqyDqW9f6T0mog",
"node_name" : "elasticsearch-data-2",
"transport_address" : "xx.xx.xx.xx:9300",
"node_decision" : "no",
"deciders" : [
{
"decider" : "replica_after_primary_active",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
},
{
"decider" : "throttling",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
}
]
},
{
"node_id" : "r6UCUEPzR6aY0Kz8NiauDg",
"node_name" : "elasticsearch-data-0",
"transport_address" : "xx.xx.xx.xx:9300",
"node_decision" : "no",
"deciders" : [
{
"decider" : "replica_after_primary_active",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
},
{
"decider" : "throttling",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
}
]
},
{
"node_id" : "sNmtl-VvQMqS2bcXEycB-g",
"node_name" : "elasticsearch-data-3",
"transport_address" : "xx.xx.xx.xx:9300",
"node_decision" : "no",
"deciders" : [
{
"decider" : "replica_after_primary_active",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
},
{
"decider" : "throttling",
"decision" : "NO",
"explanation" : "primary shard for this replica is not yet active"
}
]
}
]
}
How can I bring back my cluster to healthy state?