We are running ES 2.1 single node cluster on Windows 7 / NTFS. After ES crashed due to faulty disk (replacement is under way), the indices being written to at time of crash are stuck in translog recovery phase. I will use the .marvel index as example here.
Output from curl -s "http://$ES_HOST:9200/_cat/recovery?v=h"
index shard time type stage source_host target_host repository snapshot files files_percent bytes bytes_percent total_files total_bytes translog translog_percent total_translog
.marvel-es-2015.11.26 0 1209658 store translog 10.2.159.84 10.2.159.84 n/a n/a 0 100.0% 0 100.0% 1 130 32660 -1.0% -1
Output from curl -s "http://$ES_HOST:9200/.marvel-es-2015.11.26/_recovery?v=h"
".marvel-es-2015.11.26": {
"shards": [
{
"verify_index": {
"total_time_in_millis": 0,
"check_index_time_in_millis": 0
},
"translog": {
"total_time_in_millis": 3010971,
"total_on_start": -1,
"percent": "-1.0%",
"total": -1,
"recovered": 79466
},
"index": {
"target_throttle_time_in_millis": 0,
"source_throttle_time_in_millis": 0,
"total_time_in_millis": 24,
"files": {
"percent": "100.0%",
"recovered": 0,
"reused": 1,
"total": 1
},
"size": {
"percent": "100.0%",
"recovered_in_bytes": 0,
"reused_in_bytes": 130,
"total_in_bytes": 130
}
},
"id": 0,
"type": "STORE",
"stage": "TRANSLOG",
"primary": true,
"start_time_in_millis": 1448608132857,
"total_time_in_millis": 3010997,
"source": {
"name": "ELK0",
"ip": "10.2.x.x",
"transport_address": "10.2.x.x:9300",
"host": "10.2.x.x",
"id": "zuwWitsGS0uT9DVm9R4Fdw"
},
"target": {
"name": "ELK0",
"ip": "10.2.x.x",
"transport_address": "10.2.x.x:9300",
"host": "10.2.x.x",
"id": "zuwWitsGS0uT9DVm9R4Fdw"
}
}
]
}
}
The machine now has 100% CPU usage and no progress is seen in index recovery except the "time" and "translog" numbers rising slowly.
The translog folder looks like this:
Verzeichnis von F:\XXX\nodes\0\indices\.marvel-es-2015.11.26\0\translog
27.11.2015 08:08 <DIR> .
27.11.2015 08:08 <DIR> ..
26.11.2015 19:23 20 translog-1.ckp
26.11.2015 19:23 80.677.779 translog-1.tlog
27.11.2015 08:49 37.717.508 translog-2.tlog
27.11.2015 08:49 20 translog.ckp
4 Datei(en), 118.395.327 Bytes
The translog-2.tlog file seems to be growing slowly.
While we believe the root cause has already been identified as faulty disk, we believe index recovery is far too slow, especially if we consider .marvel index has 1 file and 130 bytes of data according to the api.