Elastic nodes goes down Out of Memory as gc kicks in

Prashant_Rana · July 5, 2019, 9:30am

I am facing the same problem as described in this discussion

ES version 5.6.4.

With ES 2.2.0 everything was working fine. The bulk index queue is full and the index requests are spread across many indices.

We are getting the following error message in the es logs.

cause [auto(bulk api)], templates [logs_indices_template, logs_ela_indices_template], shards [1]/[0], mappings [_default_]
[2019-07-05T13:25:36,221][WARN ][o.e.m.j.JvmGcMonitorService] [node] [gc][147] overhead, spent [3.3s] collecting in the last [3.4s]
[2019-07-05T13:25:37,565][WARN ][o.e.m.j.JvmGcMonitorService] [node] [gc][148] overhead, spent [1.2s] collecting in the last [1.3s]
[2019-07-05T13:25:47,781][WARN ][o.e.m.j.JvmGcMonitorService] [node] [gc][149] overhead, spent [8.6s] collecting in the last [10.2s]
[2019-07-05T13:25:47,782][ERROR][o.e.t.n.Netty4Utils      ] fatal error on the network layer
	at org.elasticsearch.transport.netty4.Netty4Utils.maybeDie(Netty4Utils.java:185)
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:83)
	at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
	at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1273)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1084)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:644)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:544)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at java.lang.Thread.run(Thread.java:745)
[2019-07-05T13:25:47,781][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node] fatal error in thread [elasticsearch[node][refresh][T#6]], exiting
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.elasticsearch.common.util.concurrent.ThreadContext$$Lambda$1340/300859499.get$Lambda(Unknown Source) ~[?:?]

Please respond if any further informatin is required

Additional info: Using Searchguard with netty4 module

HenningAndersen · July 5, 2019, 9:46am

@Prashant_Rana what java version and heap size are you using?

When ES does a heap OOM, it should dump a heap dump, it could be worthwhile looking into that.

Prashant_Rana · July 5, 2019, 9:55am

what java version and heap size are you using?

java version "1.8.0_51"

Heap Size :700mb

Abhilash_B · July 5, 2019, 10:03am

That looks like less heap allocated to ES. I would recommend atleast 2GB of heap size allocated to Elasticsearch. Looking at your logs, it just states that GC spent x seconds. The error doesn't state that Java is out of memory.

HenningAndersen · July 5, 2019, 10:10am

java version "1.8.0_51"

This is a very old version, I would recommend to upgrade to a newer java 8 version. And try with larger heap size as suggested by Abhilash_B.

Prashant_Rana · July 5, 2019, 10:10am

The ES server shuts down as soon as this happens.

This used to work just fine with es2.2.0 with same heap size.

Prashant_Rana · July 5, 2019, 10:14am

That looks like less heap allocated to ES.

Does it mean that ES 5.6.4 needs more heap size under load as compared to ES 2.2.0?

Prashant_Rana · July 5, 2019, 3:01pm

I have the heap dump. I can share it personally, can you have a look?

HenningAndersen · July 5, 2019, 3:39pm

Sure, I can take a quick look at it. I think it might be too large to upload to the forums, but you should be able to put it somewhere (google drive or similar) and share how to obtain it in a private message.

Prashant_Rana · July 6, 2019, 8:43am

Sure,I will share the heap dump with you personally.

Prashant_Rana · July 8, 2019, 7:36am

Heap dump shared, please have a look.

system · August 5, 2019, 7:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Java.lang.OutOfMemoryError: GC overhead limit exceeded Elasticsearch	5	5684	July 6, 2017
Elastic 5.6.14 cluster goes down with out of heap space error Elasticsearch	2	730	April 23, 2019
Elasticsearch lareg heap usage Elasticsearch	1	512	July 6, 2017
Heap Space, JAVA API Elasticsearch	1	381	July 6, 2017
Error on the network layer Elasticsearch	6	4338	January 5, 2017

Elastic nodes goes down Out of Memory as gc kicks in

Related topics