Elastic nodes goes down Out of Memory as gc kicks in

I am facing the same problem as described in this discussion

ES version 5.6.4.

With ES 2.2.0 everything was working fine. The bulk index queue is full and the index requests are spread across many indices.

We are getting the following error message in the es logs.

cause [auto(bulk api)], templates [logs_indices_template, logs_ela_indices_template], shards [1]/[0], mappings [_default_]
[2019-07-05T13:25:36,221][WARN ][o.e.m.j.JvmGcMonitorService] [node] [gc][147] overhead, spent [3.3s] collecting in the last [3.4s]
[2019-07-05T13:25:37,565][WARN ][o.e.m.j.JvmGcMonitorService] [node] [gc][148] overhead, spent [1.2s] collecting in the last [1.3s]
[2019-07-05T13:25:47,781][WARN ][o.e.m.j.JvmGcMonitorService] [node] [gc][149] overhead, spent [8.6s] collecting in the last [10.2s]
[2019-07-05T13:25:47,782][ERROR][o.e.t.n.Netty4Utils      ] fatal error on the network layer
	at org.elasticsearch.transport.netty4.Netty4Utils.maybeDie(Netty4Utils.java:185)
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:83)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
	at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1273)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1084)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at java.lang.Thread.run(Thread.java:745)
[2019-07-05T13:25:47,781][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node] fatal error in thread [elasticsearch[node][refresh][T#6]], exiting
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.elasticsearch.common.util.concurrent.ThreadContext$$Lambda$1340/300859499.get$Lambda(Unknown Source) ~[?:?]

Please respond if any further informatin is required

Additional info: Using Searchguard with netty4 module

@Prashant_Rana what java version and heap size are you using?

When ES does a heap OOM, it should dump a heap dump, it could be worthwhile looking into that.

what java version and heap size are you using?

java version "1.8.0_51"

Heap Size :700mb

That looks like less heap allocated to ES. I would recommend atleast 2GB of heap size allocated to Elasticsearch. Looking at your logs, it just states that GC spent x seconds. The error doesn't state that Java is out of memory.

java version "1.8.0_51"

This is a very old version, I would recommend to upgrade to a newer java 8 version. And try with larger heap size as suggested by Abhilash_B.

The ES server shuts down as soon as this happens.

This used to work just fine with es2.2.0 with same heap size.

That looks like less heap allocated to ES.

Does it mean that ES 5.6.4 needs more heap size under load as compared to ES 2.2.0?

I have the heap dump. I can share it personally, can you have a look?

Sure, I can take a quick look at it. I think it might be too large to upload to the forums, but you should be able to put it somewhere (google drive or similar) and share how to obtain it in a private message.

Sure,I will share the heap dump with you personally.

Heap dump shared, please have a look.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.