Elasticsearch service won't start, nested elasticsearch node state exception

penguinairlines · November 13, 2020, 8:02pm

Hello,

I noticed one of my elasticsearch nodes is down and started digging into the issue. I found the service won't start and notice a couple of key errors:

ElasticsearchException[failed to bind service]; nested: CorruptIndexException[Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(SimpleFSIndexInput(path="/var/lib/elasticsearch/nodes/0/_state/segments_2ewxw")))]; nested: NoSuchFileException[/var/lib/elasticsearch/nodes/0/_state/_25kuo.si];
Likely root cause: java.nio.file.NoSuchFileException: /var/lib/elasticsearch/nodes/0/_state/_25kuo.si

Indeed, that file is not present. What are my options for recovering this node?

penguinairlines · November 13, 2020, 9:10pm

Okay, after many more searches I was able to find Recovering from missing state .si file

DavidTurner helping once again, suggests that /var/lib/elasticsearch contents can be cleared without worries since any relevant data will be stored on other nodes of the cluster. I did this and re-ran my configuration management agent and the node was able to re-join the cluster.

Is there anything else I should look out for while I'm here? Otherwise I'm guessing this post will just stale out.

I'm also wondering if there's any way I can discover the root cause of this file going missing? I did say I cleared out the contents, but actually I ran mv /var/lib/elasticsearch/nodes /var/lib/elasticsearch/nodes.old, so I should be able to analyze any of the files here, but I'm not immediately aware of any tools that would let me discover the root cause. I'm going to check with my backups guy to see if he has any record of the file, but assuming these iterate in alphabetical order, then I could probably assume that it hadn't been formed yet, as there were no o's, but several p's, n's, and m's.

Anything that could help me discover the root cause would be really helpful, as I should be able to build some monitoring or CM to properly care for the directory contents. Thanks!

DavidTurner · November 13, 2020, 9:50pm

o comes before p so I don't think that follows

The two likely explanations are (a) something other than Elasticsearch removed this file or (b) you had a power outage while Elasticsearch was writing the node state and your storage system performed some operations in the wrong order just before the outage. In either case you should be worried.

The solution is not to try and monitor the contents of the data path: it's best to consider it as being entirely under Elasticsearch's control. But it's definitely worth getting to the bottom of this if you can.

penguinairlines · November 13, 2020, 10:01pm

Ah my bad. I'm spending too much time on the terminal today with no breaks

_25kuj.cfe
_25kuj.cfs
_25kuj.si
_25kuk.cfe
_25kuk.cfs
_25kuk.si
_25kul.cfe
_25kul.cfs
_25kul.si
_25kum.cfe
_25kum.cfs
_25kum.si
_25kun.cfe
_25kun.cfs
_25kun.si

Yes I'm wondering if the operation I ran to have the CM execute a POST to create a new user had somehow interrupted cluster operations, but it just seems so unlikely. I'm glad everything seems to be working though, I can breathe a bit easier going into the weekend.

system · December 11, 2020, 10:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Recovering from missing state .si file Elasticsearch	8	3340	May 29, 2020
CorruptIndexException missing .si file Elasticsearch	5	2023	July 12, 2020
Elasticsearch is not starting after the restart Elasticsearch	5	3266	July 5, 2017
ES Data Node crash and now have a heap of file errors Elasticsearch	4	1292	October 19, 2018
NoSuchFileException：/nodes/0/node.lock Elasticsearch	2	1940	April 29, 2019

Elasticsearch service won't start, nested elasticsearch node state exception

Related topics