ES Data Node crash and now have a heap of file errors


#1

Hi. One of my data nodes crashed, had to run a fsck. Server is up again but now have a heap of these errors...
In the past I only had a few and was able to delete it but this time there are too many... any way to get around?

Is there a 'quick and dirty' way to get this Server up? thanks.

Caused by: org.elasticsearch.ElasticsearchException: java.io.IOException: failed to read [id:72, file:/var/lib/elasticsearch/nodes/0/indices/cHx1fblKRsGBcNEs3eBEBQ/_state/state-72.st]

at org.elasticsearch.ExceptionsHelper.maybeThrowRuntimeAndSuppress(ExceptionsHelper.java:199) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.gateway.MetaDataStateFormat.loadLatestState(MetaDataStateFormat.java:304) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.common.util.IndexFolderUpgrader.upgrade(IndexFolderUpgrader.java:89) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.common.util.IndexFolderUpgrader.upgradeIndicesIfNeeded(IndexFolderUpgrader.java:127) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.gateway.GatewayMetaState.<init>(GatewayMetaState.java:87) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.node.Node.<init>(Node.java:447) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.node.Node.<init>(Node.java:256) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.4.0.jar:6.4.0]

... 6 more

Caused by: java.io.IOException: failed to read [id:72, file:/var/lib/elasticsearch/nodes/0/indices/cHx1fblKRsGBcNEs3eBEBQ/_state/state-72.st]

at org.elasticsearch.gateway.MetaDataStateFormat.loadLatestState(MetaDataStateFormat.java:298) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.common.util.IndexFolderUpgrader.upgrade(IndexFolderUpgrader.java:89) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.common.util.IndexFolderUpgrader.upgradeIndicesIfNeeded(IndexFolderUpgrader.java:127) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.gateway.GatewayMetaState.<init>(GatewayMetaState.java:87) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.node.Node.<init>(Node.java:447) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.node.Node.<init>(Node.java:256) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.4.0.jar:6.4.0]

... 6 more

Caused by: org.elasticsearch.gateway.CorruptStateException: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=807545890 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(SimpleFSIndexInput(path="/var/lib/elasticsearch/nodes/0/indices/cHx1fblKRsGBcNEs3eBEBQ/_state/state-72.st")))

at org.elasticsearch.gateway.MetaDataStateFormat.read(MetaDataStateFormat.java:201) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.gateway.MetaDataStateFormat.loadLatestState(MetaDataStateFormat.java:294) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.common.util.IndexFolderUpgrader.upgrade(IndexFolderUpgrader.java:89) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.common.util.IndexFolderUpgrader.upgradeIndicesIfNeeded(IndexFolderUpgrader.java:127) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.gateway.GatewayMetaState.<init>(GatewayMetaState.java:87) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.node.Node.<init>(Node.java:447) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.node.Node.<init>(Node.java:256) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:213) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:326) ~[elasticsearch-6.4.0.jar:6.4.0]

at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:136) ~[elasticsearch-6.4.0.jar:6.4.0]

(David Turner) #2

When you say "one of" your data nodes, does this mean that there are others, and they are healthy, and that your cluster health is YELLOW indicating that all primaries are allocated? In which case the best thing to do is start again with a blank slate on this node and allow Elasticsearch to restore what it needs from the rest of your cluster.

This kind of corruption suggests there might be something wrong with how your disks are configured - perhaps you have an unsafe writeback cache (i.e. without battery backup) or are otherwise preventing fsync() from doing its thing.


#3

Yes both my Data nodes hang off a NAS... it crashed and rebooted and now I have these errors.
Looks like my other DN won't start because it can't authenticate.. I do vaguely remember you could change a setting so the nodes would come up with no security but I can't find that, do you happen to have the link handy? thanks.

error message:
[2018-09-19T22:54:40,914][INFO ][o.e.x.s.a.AuthenticationService] [els04] Authentication of [elastic] was terminated by realm [reserved] - failed to authenticate user [elastic]


(David Turner) #4

You can disable security with xpack.security.enabled: false.

It's not a great idea to have two nodes share storage, because of the situation you describe: if the shared storage breaks then you might be left with no good copy of your data. A NAS is a good place to store snapshots (using the shared filesystem repository, for example) but it's preferable for each node to use local disks to isolate failures like this.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.