Shard data is missing without any reason or log

I was doing a huge indexing job(about 400 billion records), but suddenly one of the nodes went down because the power failed, after fixing the power, one shard was missing, so I used elasticsearch-trasnlog and every thing was fine for days (more than 100Billion new records was indexed), but suddenly faced a new problem:

failed shard, IndexShardRecoveryException, codec header mismach, acctual header=-6546052509 vs expcted header=1071082519 resource=mmapindexinput(path=.../index_3jm.cfs))

I see the data of shard is not there!!!
Relocate that shard and another shard failed, and I was forced to relocate the second shard to.
Any one can give me a hint?

Relates to Failed shard recovery after hard shutdown.

It sounds like you have other lurking corruptions after your power outage. Unfortunately with zero replicas and storage that isn't resilient to powerloss there's no way to get out of this situation without losing yet more data. If it were me, I would start again.

Setting index.shard.check_on_startup: true will at least check the whole index for corruption at startup rather than waiting for a corruption to be detected during a merge. This'll take a while, and you should set it back to null afterwards otherwise that check will happen every time.

If you are using version ≥ 6.5 then the elasticsearch-shard tool will delete any corrupted segments, but as with elasticsearch-translog this entails arbitrary data loss.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.