Replica shard stuck in UNASSIGNED state

dipathak · July 18, 2017, 5:55pm

Hi, We are using ES 5.2 on a 4 node cluster. Some replica shards are stuck in UNASSIGNED state since last couple of hours. Below is the information about these shards from _cluster/state API.
Also, a lot of bulk write operations are getting timed out and _cat/shards API is getting stuck and not giving any response back, when sent from curl. Can someone please take a look. How can this be debugged further. Thanks in advance.

"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : "cfileindex",
"recovery_source" : {
  "type" : "PEER"
},
"unassigned_info" : {
  "reason" : "ALLOCATION_FAILED",
  "at" : "2017-07-15T23:23:28.426Z",
  "failed_attempts" : 5,
  "delayed" : false,
  "details" : "master {130593345248}{4diFAMWcTL6N214ezq8yXA}{8nfDzbB_QMKZG41fd_-S6Q}{10.2.34.149}{10.2.34.149:25800} has not removed previously failed shard. resending shard failure",
  "allocation_status" : "no_attempt"
}




    {
      "state" : "UNASSIGNED",
      "primary" : false,
      "node" : null,
      "relocating_node" : null,
      "shard" : 8,
      "index" : "cfileindex",
      "recovery_source" : {
        "type" : "PEER"
      },
      "unassigned_info" : {
        "reason" : "ALLOCATION_FAILED",
        "at" : "2017-07-16T11:05:53.747Z",
        "failed_attempts" : 5,
        "delayed" : false,
        "details" : "failed recovery, failure RecoveryFailedException[[cfileindex][8]: Recovery failed from {130593347324}{JAvDtnPwSXuNl7AYLjbgsw}{7loDHDZJQta-Ws1ZmM4WBA}{10.2.34.165}{10.2.34.165:25800} into {130593342308}{TxkUgGT_QrmyaoM6x5U__g}{rnaZ_IV_TR-6ns48jskCjw}{10.2.34.155}{10.2.34.155:25800}]; nested: RemoteTransportException[[130593347324][10.2.34.165:25800][internal:index/shard/recovery/start_recovery]]; nested: ReceiveTimeoutTransportException[[130593342308][10.2.34.155:25800][internal:index/shard/recovery/finalize] request_id [178425] timed out after [1800000ms]]; ",
        "allocation_status" : "no_attempt"
      }
    }

bleskes · July 21, 2017, 6:33am

You might have had a network hickup causing a shard to be unassigned and then quickly reassigned to the same node. The error you had suggest that by the time of re-assignment the node didn't yet clear it's old shard copy (we had some bugs in this area so I suggest you upgrade to 5.5). The master tried this 5 times and then gave up inorder to avoid poisonous situations and flooding the logs. You can try again to the shard using POST /_cluster/reroute?retry_failed - see https://www.elastic.co/guide/en/elasticsearch/reference/5.5/cluster-reroute.html#_retry_failed_shards

system · August 18, 2017, 6:33am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shards stuck in unassigned state Elasticsearch	3	540	March 11, 2019
How to resolve the below unassigned shards? Elasticsearch	6	5174	April 20, 2021
Shard stuck in INITIALIZING state Elasticsearch	2	14281	June 17, 2017
Unassigned replication shards on elasticSearch 5.2[is it a bug?] Elasticsearch	22	2472	April 22, 2019
Shards are UNASSIGNED state in ES 7.7 version Elasticsearch	4	344	January 7, 2021

Replica shard stuck in UNASSIGNED state

Related topics