{"type": "server", "timestamp": "2023-11-21T09:52:52,412Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "elasticsearch", "node.name": "elasticsearch-es-master-1", "message": "uncaught exception in thread [main]",
"stacktrace": ["org.elasticsearch.bootstrap.StartupException: ElasticsearchException[failed to bind service]; nested: CorruptIndexException[codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/usr/share/elasticsearch/data/nodes/0/_state/_1g1h.si")))];",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:173) ~[elasticsearch-7.17.8.jar:7.17.8]",
"at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:160) ~[elasticsearch-7.17.8.jar:7.17.8]",
"at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77) ~[elasticsearch-7.17.8.jar:7.17.8]",
"at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112) ~[elasticsearch-cli-7.17.8.jar:7.17.8]",
"at org.elasticsearch.cli.Command.main(Command.java:77) ~[elasticsearch-cli-7.17.8.jar:7.17.8]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:125) ~[elasticsearch-7.17.8.jar:7.17.8]",
"at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80) ~[elasticsearch-7.17.8.jar:7.17.8]",
"Caused by: org.elasticsearch.ElasticsearchException: failed to bind service",
"at org.elasticsearch.node.Node.(Node.java:1088) ~[elasticsearch-7.17.8.jar:7.17.8]",
"at org.elasticsearch.node.Node.(Node.java:309) ~[elasticsearch-7.17.8.jar:7.17.8]",
"at org.elasticsearch.bootstrap.Bootstrap$5.(Bootstrap.java:234) ~[elasticsearch-7.17.8.jar:7.17.8]",
"at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:234) ~[elasticsearch-7.17.8.jar:7.17.8]",
"at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:434) ~[elasticsearch-7.17.8.jar:7.17.8]",
"at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:169) ~[elasticsearch-7.17.8.jar:7.17.8]",
"... 6 more",
"Caused by: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/usr/share/elasticsearch/data/nodes/0/_state/_1g1h.si")))",
"at org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:523) ~[lucene-cor
hi,DavidTurner, we have 3 master nodes nodes and 9 workers. The power outage caused 2 master nodes to report the above error.
master0 error
elasticsearch Likely root cause: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/usr/share/elasticsearch/data/nodes/0/_state/_1j1o.si")))
master-1 error
elasticsearch ElasticsearchException[failed to bind service]; nested: CorruptIndexException[codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/usr/share/elasticsearch/data/nodes/0/_state/_1g1h.si")))]; │
│ elasticsearch Likely root cause: org.apache.lucene.index.CorruptIndexException: codec footer mismatch (file truncated?): actual footer=0 vs expected footer=-1071082520 (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/usr/share/elasticsearch/data/nodes/0/_state/_1g1h.si")))
I think that's covered by the docs I linked above, particularly:
If a file is needed to recover an index after a restart then your storage system previously confirmed to Elasticsearch that this file was durably synced to disk. On Linux this means that the
fsync()
system call returned successfully. Elasticsearch sometimes reports that an index is corrupt because a file needed for recovery has been truncated or is missing its footer. This indicates that your storage system acknowledges durable writes incorrectly.
Thank you. Is there any way to resolve the inconsistent footer crc verification and restore es cluster?
The master volume uses PV provided by Longhorn, and the power outage caused the Longhorn distributed system restart, as well as es which was also restarted due to the power outage in the server room.
You'll need to ask the Longhorn folks if there's any way to restore the data it lost. If they say no then you'll need to restore the cluster from a recent snapshot.
thanks @DavidTurner
Hi David, Is it possible to repair the master node through the meta information of the data node?
No, the cluster metadata is only stored on a majority (i.e. 2 of 3) of the master nodes.
ok,thanks very much
FWIW here are the relevant docs:
If the logs or the health report indicate that Elasticsearch can’t discover enough nodes to form a quorum, you must address the reasons preventing Elasticsearch from discovering the missing nodes. The missing nodes are needed to reconstruct the cluster metadata. Without the cluster metadata, the data in your cluster is meaningless. The cluster metadata is stored on a subset of the master-eligible nodes in the cluster. If a quorum can’t be discovered, the missing nodes were the ones holding the cluster metadata.
Ensure there are enough nodes running to form a quorum and that every node can communicate with every other node over the network. Elasticsearch will report additional details about network connectivity if the election problems persist for more than a few minutes. If you can’t start enough nodes to form a quorum, start a new cluster and restore data from a recent snapshot. Refer to Quorum-based decision making for more information.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.