Unable to assign shards - "structure needs cleaning"


(Aleej) #1

Hi All!

Wondering if anyone has suggestions. We've got a cluster that current keeps running into problems with shards being unassigned. Manually allocating the shard fails with the error "structure needs cleaning" (see logs below). Anyone have any idea on what causes this and what works best to recover?

I saw this prior post - unassigned-shards-in-10-node-cluster which seems to lead to a dead end. Don't know if OP @ivten has any updates from the prior case.

Error message:
{
"index": "logstash-2018.02.05",
"shard": 6,
"primary": false,
"current_state": "unassigned",
"unassigned_info": {
"reason": "ALLOCATION_FAILED",
"at": "2018-02-05T01:44:10.312Z",
"failed_allocation_attempts": 5,
"details": "failed recovery, failure RecoveryFailedException[[logstash-2018.02.05][6]: Recovery failed from {elasticsearch_data_54}{cB904jlPS3WltGsM885a0g}{pxFRvAldTKGpohmz3LzCuQ}{redacted_ip}{redacted_ip:9300} into {elasticsearch_data_45}{GVKKtQsiQ6a2MCqUt7HFXw}{6HztnYV3QV2AC6q-cLCR8A}{redacted_ip}{redacted_ip:9300}]; nested: RemoteTransportException[[elasticsearch_data_54][redacted_ip:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [131] files with total size of [434.7mb]]; nested: RemoteTransportException[[elasticsearch_data_45][redacted_ip:9300][internal:index/shard/recovery/file_chunk]]; nested: IOException[Structure needs cleaning]; ",
"last_allocation_status": "no_attempt"
}


(Steven) #2

Also getting this issue, we are using SAN/iSCSI attached storage


#3

I am the OP of the referenced in the first post topic.

In my case when I ran Elasticsearch on SAN with iSCSI it continued to encounter corrupted files from time to time which lead to the shard having the file in question ending as unnasigned. My solution was to delete the shard and move on since occasionally losing a shard was ok for my use case.

In my case the corruption was most likely caused by plugin that I was using to make the docker volumes available across all nodes in the docker swarm cluster: https://github.com/hpe-storage/python-hpedockerplugin

You should specify move details about your setup otherwise it's hard for others to troubleshoot.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.