Hi, We are using ES 5.2 on a 4 node cluster. Some replica shards are stuck in UNASSIGNED state since last couple of hours. Below is the information about these shards from _cluster/state API.
Also, a lot of bulk write operations are getting timed out and _cat/shards API is getting stuck and not giving any response back, when sent from curl. Can someone please take a look. How can this be debugged further. Thanks in advance.
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "cfileindex",
"recovery_source" : {
"type" : "PEER"
},
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2017-07-15T23:23:28.426Z",
"failed_attempts" : 5,
"delayed" : false,
"details" : "master {130593345248}{4diFAMWcTL6N214ezq8yXA}{8nfDzbB_QMKZG41fd_-S6Q}{10.2.34.149}{10.2.34.149:25800} has not removed previously failed shard. resending shard failure",
"allocation_status" : "no_attempt"
}
{
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 8,
"index" : "cfileindex",
"recovery_source" : {
"type" : "PEER"
},
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2017-07-16T11:05:53.747Z",
"failed_attempts" : 5,
"delayed" : false,
"details" : "failed recovery, failure RecoveryFailedException[[cfileindex][8]: Recovery failed from {130593347324}{JAvDtnPwSXuNl7AYLjbgsw}{7loDHDZJQta-Ws1ZmM4WBA}{10.2.34.165}{10.2.34.165:25800} into {130593342308}{TxkUgGT_QrmyaoM6x5U__g}{rnaZ_IV_TR-6ns48jskCjw}{10.2.34.155}{10.2.34.155:25800}]; nested: RemoteTransportException[[130593347324][10.2.34.165:25800][internal:index/shard/recovery/start_recovery]]; nested: ReceiveTimeoutTransportException[[130593342308][10.2.34.155:25800][internal:index/shard/recovery/finalize] request_id [178425] timed out after [1800000ms]]; ",
"allocation_status" : "no_attempt"
}
}