Single shard failing with snapshot

I wish I knew why my backup system is so brittle : (

I have a single shard failing for the last couple of days - from kibana:

INTERNAL_SERVER_ERROR: UncategorizedExecutionException[Failed execution]; 
nested: ExecutionException[java.io.IOException: Input/output error: 
NIOFSIndexInput(path="/data/elasticsearch/security/nodes/0/indices/3dqdQFyJRyuUO_vVLon_WQ/2/index/_1bny.fdt")];
 nested: IOException[Input/output error: 
NIOFSIndexInput(path="/data/elasticsearch/security/nodes/0/indices/3dqdQFyJRyuUO_vVLon_WQ/2/index/_1bny.fdt")];
 nested: IOException[Input/output error]

Unfortunately it does not say which node the file is supposed to be on.

Should I delete the index and restore it from a good snapshot?

it does not say which node the file is supposed to be on.

I'm not familiar with this exact error message format, but assuming it means there's an Input/output error when reading /data/elasticsearch/security/nodes/0/indices/3dqdQFyJRyuUO_vVLon_WQ/2/index/_1bny.fdt then you're looking for a node holding a copy of shard 2 of the index with UUID 3dqdQFyJRyuUO_vVLon_WQ, so look for that UUID in GET /_cat/indices?h=uuid,index and then look for the node in GET /_cat/shards/$INDEX?s=s,p. If it's snapshot-related then it's probably the node with the primary shard.

Again assuming there's a problem reading _1bny.fdt, the error message Input/output error comes from the OS and normally means there's some problem with that node's storage. I'd suggest looking at its kernel logs with dmesg to confirm, and replace any faulty/suspect hardware before doing much more.

thanks David! Will track down which node is affected and see if i can figure out what is wrong with the disk.

1 Like

confirm disk errors - the hardware is old so this is not too surprising. Virtualising the server as it isn't worth replacing the disk.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.