Netty LEAK logs on elasticsearch clusters probably after version update

coezdemir · March 12, 2025, 4:38pm

Hello,

We have two kubernetes clusters running on two DCs, on these clusters we have 2 elasticsearch clusters. These two clusters are being used by a service on our webpage to respond queries.

Cluster details per DC (old):
ECK-operator version: 1.12.1

Elasticsearch version: 8.12.0

Data/Master combined nodes count:3
CPU: 8
Memory: 8
Heap for JVM: 4

Service details:
Elastic java API client version: 8.15.5

We have two types of indices;

First type is being reindexed every hour, these indices are tiny having 20 k documents.
Second type is being reindexed every 15 minutes and around 10 million documents are being indexed within 3 minutes.

Reindex procedure is as follows, we create the new indices, populate the documents move the aliases and remove old indices. Since the data is emphemeral our development team choose this approach.

On the old cluster we had some problems, when indexing and heavy searching was going on together. But we never had issues (as far as we know).

CLUSTER UPGRADE DETAILS

We recently upgraded our eck-operator and Elasticsearch versions.

Cluster details per DC (new):
ECK-operator version: 1.16.1

Elasticsearch version: 8.13.4

Data/Master combined nodes count:3
CPU: 8
Memory: 8
Heap for JVM: 4

After the upgrade we started to see LEAK logs on data nodes, and the second type of indices were never able to be indexed anymore so the indexer for the big indices were shut down. Unfortunately, since we found this now, we dont have logs from the time of the old cluster settings to check, if LEAK logs were present there as well.

Log sample:

{"@timestamp":"2025-03-12T16:00:13.346Z", "log.level":"ERROR", "message":"LEAK: ByteBuf.release() was not called before it's garbage-collected. See https://netty.io/wiki/reference-counted-objects.html for more information.\nRecent access records: \nCreated at:\n\tio.netty.buffer@4.1.94.Final/io.netty.buffer.PooledByteBufAllocator.newHeapBuffer(PooledByteBufAllocator.java:390)\n\tio.netty.buffer@4.1.94.Final/io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:169)\n\tio.netty.buffer@4.1.94.Final/io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:160)\n\torg.elasticsearch.transport.netty4@8.13.4/org.elasticsearch.transport.netty4.NettyAllocator$NoDirectBuffers.heapBuffer(NettyAllocator.java:282)\n\torg.elasticsearch.transport.netty4@8.13.4/org.elasticsearch.transport.netty4.NettyAllocator$NoDirectBuffers.buffer(NettyAllocator.java:252)\n\tio.netty.handler@4.1.94.Final/io.netty.handler.ssl.SslHandler.allocate(SslHandler.java:2267)\n\tio.netty.handler@4.1.94.Final/io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1348)\n\tio.netty.handler@4.1.94.Final/io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1246)\n\tio.netty.handler@4.1.94.Final/io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1295)\n\tio.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529)\n\tio.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468)\n\tio.netty.codec@4.1.94.Final/io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)\n\tio.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)\n\tio.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tio.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)\n\tio.netty.transport@4.1.94.Final/io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)\n\tio.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)\n\tio.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)\n\tio.netty.transport@4.1.94.Final/io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)\n\tio.netty.transport@4.1.94.Final/io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)\n\tio.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)\n\tio.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)\n\tio.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)\n\tio.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)\n\tio.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tio.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tjava.base/java.lang.Thread.run(Thread.java:1583)", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[***][transport_worker][T#2]","log.logger":"io.netty.util.ResourceLeakDetector","elasticsearch.cluster.uuid":"***","elasticsearch.node.id":"W_xLbnr4SKSe4SZeSoLLsQ","elasticsearch.node.name":"***","elasticsearch.cluster.name":"***"}

with this ERROR we also started to see GC spikes and OOMs on datanodes, we caught this because elasticsearch created with eck-operator creates HeapDumps on the same data dir with elastic nodes. Which caused disk space issues.

Did anyone experienced this problem after upgrading from 8.12.0?

Regards.

Topic		Replies	Views
Warn which crashes server Elasticsearch	16	2157	July 6, 2017
Elasticsearch upgrade from 0.19.2 to higher 0.19.x because of Netty? Elasticsearch	2	325	July 6, 2017
Memory leak like behaviour Elasticsearch	1	274	July 6, 2017
Disappearing Shards Elasticsearch	10	414	July 6, 2017
Lack of memory? Elasticsearch	11	800	July 6, 2017

Netty LEAK logs on elasticsearch clusters probably after version update

Related topics