Index corruption with .tim file checksum mismatch

Have elastic search setup with cluster size of 20+ nodes. Last week encountered an issue related to index corruption with following traceback in logs.

org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=8kz594 actual=d7xu05 (resource=name [_5y0_Lucene50_0.tim], length [370692269], checksum [8kz594], writtenBy [6.6.1])

There was enough space on disk, having disk size of 1Tb with consumed only 200Gb.

Elasticsearch Version: 5.6.6

Any help regarding this is appreciated.

Thanks in advance.

Normally it's this: a hardware problem. The bytes that Elasticsearch wrote to disk weren't the bytes it read back again.

Thanks for the info David. Is there any know scenario in which this might occur? We are running it on top of Virtual Enviorment. Any inputs would be appreciated.

Thanks in advance.

Yes, the most common scenario is a hardware problem. Sometimes it's the disk, sometimes a failing RAID controller or a flaky SAN. Occasionally it's a bug in a filesystem implementation, particularly if you're not using local disks. Since you're using virtualisation it could also be an issue in the host OS or the hypervisor. There's a lot of places in between Elasticsearch and the disk in which things can go wrong, and unfortunately from Elasticsearch's point of view it can't tell you any more: they bytes it wrote weren't the bytes it read back again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.