Corrupted elastic index

This is an indication of either hardware errors (check dmesg) or that it maybe ran out of disk space.

At this point segments are corrupted and data is lost, meaning that you can't recover the whole index anymore. Unless you have a snapshot (which is recommended for production).

There are a couple of options to try to partially recover this index:

  1. Try to partially recover the corrupted shard:
    1. Close the index.
    2. Set index.shard.check_on_startup: fix for this index.
    3. Open the index. At this time index will start to be verified and may take a long time.
    4. If it recovers, then you need to redo step 1 to 3 but set index.shard.check_on_startup: false otherwise it will always try to fix when it opens again.
  2. If shard can't be partially recovered then the only way is to completely drop it so at least the index can be recovered with the other healthy shards. For doing that you could try the allocate_empty_primary command of Cluster Reroute API.

None of these are guaranteed to work as it is highly dependent of the type of damage.

6 Likes