CheckIndex dies with bus error ... how else can I fix shard?


(Ecweaver) #1

I have an index shard that seems to have been damaged by a server reboot (neglected to stop indexing and flush first, it appears). It won't complete replicating and the master is chattering madly on the network unless the index is closed.

Ran CheckIndex on that shard (with the index closed), and it's consistently errored out on the file whose name shows up in the ES logs. It says "Bus Error" meaning I think dereferencing a pointer with not enough zeroes on the right. Is there any way to just manually delete that particular file out of the shard's consciousness?

I have made a Snapshot that ended in Partial status, might that be restorable, so I can at least recover the rest of the shards?

Thanks.


(Mark Walkom) #2

I'd say your shard is lost, did you have replicas?


(Ecweaver) #3

Unfortunately not. That was the primary; it would hang when trying to replicate. Took a snapshot, and trying a restore with partial:true... the damaged shard is coming up empty as expected.


(system) #4