Zlib 1.2.12 getting corruption errors

Hi all

I filed a bug several days ago but have had no response, has no one else had issues after upgrading zlib to 1.2.12?

I tried this on a new clean data directory with a single node, if I downgrade to 1.2.11 the service starts fine and I can create indexes and index documents, when I use zlib 1.2.12 the service fails with errors

/usr/share/Elasticsearch/logs/Elasticsearch.log

[2022-04-05T05:06:08,301][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [gxdev1] uncaught exception in thread [main]
org.elasticsearch.bootstrap.StartupException: org.elasticsearch.ElasticsearchException: failed to load metadata
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:170) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:157) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.common.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:81) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112) ~[elasticsearch-cli-8.1.0.jar:8.1.0]
	at org.elasticsearch.cli.Command.main(Command.java:77) ~[elasticsearch-cli-8.1.0.jar:8.1.0]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:122) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80) ~[elasticsearch-8.1.0.jar:8.1.0]
Caused by: org.elasticsearch.ElasticsearchException: failed to load metadata
	at org.elasticsearch.gateway.GatewayMetaState.start(GatewayMetaState.java:162) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.node.Node.start(Node.java:1142) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.bootstrap.Bootstrap.start(Bootstrap.java:272) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:367) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:166) ~[elasticsearch-8.1.0.jar:8.1.0]
	... 6 more
Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=226868ae actual=fcd3484d (resource=BufferedChecksumIndexInput(NIOFSIndexInput(path="/mq_cluster/data/elasticsearch/_state/_9c.fdt"))
	at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:440) ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz - 2021-12-01 14:23:49]
	at org.apache.lucene.codecs.lucene90.Lucene90CompoundFormat.writeCompoundFile(Lucene90CompoundFormat.java:123) ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz - 2021-12-01 14:23:49]
	at org.apache.lucene.codecs.lucene90.Lucene90CompoundFormat.write(Lucene90CompoundFormat.java:98) ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz - 2021-12-01 14:23:49]
	at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:5563) ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz - 2021-12-01 14:23:49]
	at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:537) ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz - 2021-12-01 14:23:49]
	at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:468) ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz - 2021-12-01 14:23:49]
	at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:497) ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz - 2021-12-01 14:23:49]
	at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:676) ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz - 2021-12-01 14:23:49]
	at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:4014) ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz - 2021-12-01 14:23:49]
	at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3988) ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz - 2021-12-01 14:23:49]
	at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3967) ~[lucene-core-9.0.0.jar:9.0.0 0b18b3b965cedaf5eb129aa41243a44c83ca826d - jpountz - 2021-12-01 14:23:49]
	at org.elasticsearch.gateway.PersistedClusterStateService$MetadataIndexWriter.flush(PersistedClusterStateService.java:692) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.gateway.PersistedClusterStateService$Writer.addMetadata(PersistedClusterStateService.java:991) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.gateway.PersistedClusterStateService$Writer.overwriteMetadata(PersistedClusterStateService.java:975) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.gateway.PersistedClusterStateService$Writer.writeFullStateAndCommit(PersistedClusterStateService.java:788) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.gateway.GatewayMetaState$LucenePersistedState.<init>(GatewayMetaState.java:450) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.gateway.GatewayMetaState.start(GatewayMetaState.java:131) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.node.Node.start(Node.java:1142) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.bootstrap.Bootstrap.start(Bootstrap.java:272) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:367) ~[elasticsearch-8.1.0.jar:8.1.0]
	at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:166) ~[elasticsearch-8.1.0.jar:8.1.0]
	... 6 more
[2022-04-05T05:06:08,309][INFO ][o.e.n.Node               ] [gxdev1] stopping ...
[2022-04-05T05:06:08,353][INFO ][o.e.n.Node               ] [gxdev1] stopped
[2022-04-05T05:06:08,354][INFO ][o.e.n.Node               ] [gxdev1] closing ...
[2022-04-05T05:06:08,369][INFO ][o.e.n.Node               ] [gxdev1] closed
[2022-04-05T05:06:08,371][INFO ][o.e.x.m.p.NativeController] [gxdev1] Native controller process has stopped - no new native processes can be started

7.1 is EOL and no longer supported. It's likely your issue will be closed, directing you to upgrade to a supported version - 7.10.x or above.

While it's true that they should definitely move away from 7.1.2, and also Arch isn't supported, the OP indicates that this is a problem in 8.1.0 too. Latest zlib does include some changes in how CRCs are calculated which could be having an impact here, although I haven't been able to reproduce the failure myself.

Ahh I missed that in the logs.

1 Like

this was resolved with help from bug report on github DaveCTurner pointed out it is a CPU specific issue, in proxmox change cpu (from kvm64) to Haswell

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.