Single shard failing with snapshot

Russell_Fulton · October 9, 2023, 6:41pm

I wish I knew why my backup system is so brittle : (

I have a single shard failing for the last couple of days - from kibana:

INTERNAL_SERVER_ERROR: UncategorizedExecutionException[Failed execution]; 
nested: ExecutionException[java.io.IOException: Input/output error: 
NIOFSIndexInput(path="/data/elasticsearch/security/nodes/0/indices/3dqdQFyJRyuUO_vVLon_WQ/2/index/_1bny.fdt")];
 nested: IOException[Input/output error: 
NIOFSIndexInput(path="/data/elasticsearch/security/nodes/0/indices/3dqdQFyJRyuUO_vVLon_WQ/2/index/_1bny.fdt")];
 nested: IOException[Input/output error]

Unfortunately it does not say which node the file is supposed to be on.

Should I delete the index and restore it from a good snapshot?

DavidTurner · October 10, 2023, 4:40am

it does not say which node the file is supposed to be on.

I'm not familiar with this exact error message format, but assuming it means there's an Input/output error when reading /data/elasticsearch/security/nodes/0/indices/3dqdQFyJRyuUO_vVLon_WQ/2/index/_1bny.fdt then you're looking for a node holding a copy of shard 2 of the index with UUID 3dqdQFyJRyuUO_vVLon_WQ, so look for that UUID in GET /_cat/indices?h=uuid,index and then look for the node in GET /_cat/shards/$INDEX?s=s,p. If it's snapshot-related then it's probably the node with the primary shard.

Again assuming there's a problem reading _1bny.fdt, the error message Input/output error comes from the OS and normally means there's some problem with that node's storage. I'd suggest looking at its kernel logs with dmesg to confirm, and replace any faulty/suspect hardware before doing much more.

Russell_Fulton · October 10, 2023, 5:34am

thanks David! Will track down which node is affected and see if i can figure out what is wrong with the disk.

Russell_Fulton · October 10, 2023, 9:53pm

confirm disk errors - the hardware is old so this is not too surprising. Virtualising the server as it isn't worth replacing the disk.

system · November 7, 2023, 9:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Understanding Index Shard Snapshot Failed Exception Elasticsearch	4	3471	January 2, 2019
Input/output error when creating snapshot Elasticsearch snapshot-and-restore	2	780	December 9, 2022
Cannot get failed shard back online Elasticsearch	3	917	July 6, 2021
ELK Snapshots Fail: Elasticsearch snapshot-and-restore	4	365	October 28, 2022
What is actually causing these shard snapshot failures? Elasticsearch	8	371	October 4, 2023

Single shard failing with snapshot

Related topics