ES Data Node crash and now have a heap of file errors

jamesl · September 17, 2018, 10:09pm

Hi. One of my data nodes crashed, had to run a fsck. Server is up again but now have a heap of these errors...
In the past I only had a few and was able to delete it but this time there are too many... any way to get around?

Is there a 'quick and dirty' way to get this Server up? thanks.

Caused by: org.elasticsearch.ElasticsearchException: java.io.IOException: failed to read [id:72, file:/var/lib/elasticsearch/nodes/0/indices/cHx1fblKRsGBcNEs3eBEBQ/_state/state-72.st]

at org.elasticsearch.ExceptionsHelper.maybeThrowRuntimeAndSuppress(ExceptionsHelper.java:199) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.gateway.MetaDataStateFormat.loadLatestState(MetaDataStateFormat.java:304) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.common.util.IndexFolderUpgrader.upgrade(IndexFolderUpgrader.java:89) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.common.util.IndexFolderUpgrader.upgradeIndicesIfNeeded(IndexFolderUpgrader.java:127) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.gateway.GatewayMetaState.&lt;init&gt;(GatewayMetaState.java:87) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.node.Node.&lt;init&gt;(Node.java:447) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.node.Node.&lt;init&gt;(Node.java:256) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap$5.&lt;init&gt;(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.4.0.jar:6.4.0]

... 6 more

Caused by: java.io.IOException: failed to read [id:72, file:/var/lib/elasticsearch/nodes/0/indices/cHx1fblKRsGBcNEs3eBEBQ/_state/state-72.st]

at org.elasticsearch.gateway.MetaDataStateFormat.loadLatestState(MetaDataStateFormat.java:298) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.common.util.IndexFolderUpgrader.upgrade(IndexFolderUpgrader.java:89) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.common.util.IndexFolderUpgrader.upgradeIndicesIfNeeded(IndexFolderUpgrader.java:127) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.gateway.GatewayMetaState.&lt;init&gt;(GatewayMetaState.java:87) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.node.Node.&lt;init&gt;(Node.java:447) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.node.Node.&lt;init&gt;(Node.java:256) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap$5.&lt;init&gt;(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.4.0.jar:6.4.0]

... 6 more

Caused by: org.elasticsearch.gateway.CorruptStateException: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=807545890 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(SimpleFSIndexInput(path="/var/lib/elasticsearch/nodes/0/indices/cHx1fblKRsGBcNEs3eBEBQ/_state/state-72.st")))

at org.elasticsearch.gateway.MetaDataStateFormat.read(MetaDataStateFormat.java:201) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.gateway.MetaDataStateFormat.loadLatestState(MetaDataStateFormat.java:294) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.common.util.IndexFolderUpgrader.upgrade(IndexFolderUpgrader.java:89) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.common.util.IndexFolderUpgrader.upgradeIndicesIfNeeded(IndexFolderUpgrader.java:127) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.gateway.GatewayMetaState.&lt;init&gt;(GatewayMetaState.java:87) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.node.Node.&lt;init&gt;(Node.java:447) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.node.Node.&lt;init&gt;(Node.java:256) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap$5.&lt;init&gt;(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.4.0.jar:6.4.0]

DavidTurner · September 18, 2018, 2:39pm

When you say "one of" your data nodes, does this mean that there are others, and they are healthy, and that your cluster health is YELLOW indicating that all primaries are allocated? In which case the best thing to do is start again with a blank slate on this node and allow Elasticsearch to restore what it needs from the rest of your cluster.

This kind of corruption suggests there might be something wrong with how your disks are configured - perhaps you have an unsafe writeback cache (i.e. without battery backup) or are otherwise preventing fsync() from doing its thing.

jamesl · September 19, 2018, 12:55pm

Yes both my Data nodes hang off a NAS... it crashed and rebooted and now I have these errors.
Looks like my other DN won't start because it can't authenticate.. I do vaguely remember you could change a setting so the nodes would come up with no security but I can't find that, do you happen to have the link handy? thanks.

error message:
[2018-09-19T22:54:40,914][INFO ][o.e.x.s.a.AuthenticationService] [els04] Authentication of [elastic] was terminated by realm [reserved] - failed to authenticate user [elastic]

DavidTurner · September 21, 2018, 3:05pm

You can disable security with xpack.security.enabled: false.

It's not a great idea to have two nodes share storage, because of the situation you describe: if the shared storage breaks then you might be left with no good copy of your data. A NAS is a good place to store snapshots (using the shared filesystem repository, for example) but it's preferable for each node to use local disks to isolate failures like this.

system · October 19, 2018, 3:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cant start elasticsearch FailedNodeException Elasticsearch	4	1894	April 29, 2017
Elasticsearch on Kubernetes - cluster red, one node cannot start Elasticsearch	6	1525	January 19, 2020
Elasticsearch Crash on Start up Elasticsearch	3	2212	April 20, 2017
Getting multiple exception on restart of node - org.elasticsearch.action.FailedNodeException Elasticsearch	2	2134	March 29, 2018
Likely root cause: org.elasticsearch.gateway.CorruptStateException: codec footer mismatch (file truncated?): actual footer=-993495054 vs expected footer=-1071082520 Elasticsearch	8	3986	August 23, 2019

ES Data Node crash and now have a heap of file errors

Related topics