Recovering from missing state .si file

Elasticsearch service has been terminated and fails to restart. The error returned is

Job for elasticsearch.service failed because the control process exited with error code.
See "systemctl status elasticsearch.service" and "journalctl -xe" for details.

The log says
org.elasticsearch.bootstrap.StartupException: ElasticsearchException[failed to bind service]; nested: CorruptIndexException[Unexpected file read error while reading index. (resource=BufferedChecksumIndexInput(SimpleFSIndexInput(path="/var/lib/elasticsearch/nodes/0/_state/segments_11mz")))]; nested: NoSuchFileException[/var/lib/elasticsearch/nodes/0/_state/_kfz.si];

I can confirm that the file /var/lib/elasticsearch/nodes/0/_state/_kfz.si is indeed missing.

How can I recover Elasticsearch to a state where I can restart it?

Unfortunately that file is essential to Elasticsearch. How did you get it into this state?

Assuming you don't have a copy of this file elsewhere, your best bet is to wipe this node and start again, allowing Elasticsearch to recover any missing shards from the other nodes in the cluster. Alternatively you can restore from a recent snapshot.

Thanks for the response. I haven't been able to identify the root cause leading to the missing file. Elasticsearch has been running on a server without me doing anything.

The cluster only has one node but I have a snapshot from a few days back. What's the recommended procedure to "wipe this node"? Unistall Elasticsearch, remove /var/lib/elasticsearch, reinstall Elasticsearch and then restore from snapshot? (I'm running Debian and using apt-get to install ES if it matters.)

According to the log message your data path is /var/lib/elasticsearch, which means it should be enough to delete the contents of that directory and start Elasticsearch up again. No need to uninstall/reinstall anything AFAIK.

1 Like

Thanks a lot!

After wiping var/lib/elasticsearch I had to reset the password for the elastic user but I managed.

The next issue is that I either don't know the name of my snapshot or there's still something missing. When I try to run

curl -XPOST -u elastic  localhost:9200/_snapshot/$MY_BACKUP/$MY_SNAPSHOT/_restore

with various versions of $MY_BACKUP and $MY_SNAPSHOT I always get the response repository_missing_exception. I can access my snapshot folder but don't know how to fetch the backup and snapshot names.

Did you register the repository again? If not, you'll need to do that. You can list the currently-registered repositories with GET _snapshot/_all, and list the snapshots within a repository called $REPOSITORY_NAME using GET /_snapshot/$REPOSITORY_NAME/_all.

1 Like

Thanks again!

I was able to restore a snapshot.

(I also added another node to my cluster and I'm in progress of adding a third one.)

By the way, am I supposed to find documentation of _snapshot under https://www.elastic.co/guide/en/elasticsearch/reference/7.6/rest-apis.html

(I can find https://www.elastic.co/guide/en/elasticsearch/reference/7.6/snapshot-restore.html, https://www.elastic.co/guide/en/elasticsearch/reference/7.6/snapshots-register-repository.html and https://www.elastic.co/guide/en/elasticsearch/reference/7.6/snapshots-take-snapshot.html but under the REST APIs I can only fing snapshot lifecycle management documentation...)

That's probably not deliberate, the structure of the reference manual is undergoing some big improvements at the moment so there are some inconsistencies in exactly how and where things are documented. I opened https://github.com/elastic/elasticsearch/issues/56069 in case that omission isn't tracked elsewhere.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.