Hi
I was doing a huge indexing job(about 400 billion records), but suddenly one of the nodes went down because the power failed, after fixing the power, one shard was missing, so I used elasticsearch-trasnlog and every thing was fine for days (more than 100Billion new records was indexed), but suddenly faced a new problem:
I see the data of shard is not there!!!
Relocate that shard and another shard failed, and I was forced to relocate the second shard to.
Any one can give me a hint?
It sounds like you have other lurking corruptions after your power outage. Unfortunately with zero replicas and storage that isn't resilient to powerloss there's no way to get out of this situation without losing yet more data. If it were me, I would start again.
Setting index.shard.check_on_startup: true will at least check the whole index for corruption at startup rather than waiting for a corruption to be detected during a merge. This'll take a while, and you should set it back to null afterwards otherwise that check will happen every time.
If you are using version ≥ 6.5 then the elasticsearch-shard tool will delete any corrupted segments, but as with elasticsearch-translog this entails arbitrary data loss.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.