ES 6.2.4 Coordinator node sudden OutOfMemory

Hi Team,

I'm trying to figure out the reason for the sudden death of one of my coordinator nodes.

I did not change any JVM default settings and set -Xms4g -Xmx4g.

The coordinator node has been working fine for months but we did upgrade the cluster from 6.2.1 to 6.2.4 last 2 weeks - not sure if related.

The logs do not tell much:

[2018-05-10T15:26:18,255][ERROR][o.e.t.n.Netty4Utils      ] fatal error on the network layer
        at org.elasticsearch.transport.netty4.Netty4Utils.maybeDie(Netty4Utils.java:184)
        at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.exceptionCaught(Netty4HttpRequestHandler.java:89)
        at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)

        ......

        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
        at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
        at java.lang.Thread.run(Thread.java:745)

Then:

[2018-05-10T15:26:18,256][WARN ][o.e.t.n.Netty4Transport  ] [dummy_coordinator_node_name] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:9304, remoteAddress=/171.134.101.123:49306}]
java.nio.channels.ClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-05-10T15:26:18,259][WARN ][o.e.t.n.Netty4Transport  ] [dummy_coordinator_node_name] send message failed [channel: NettyTcpChannel{localAddress=0.0.0.0/0.0.0.0:9304, remoteAddress=/171.134.101.123:49306}]
java.nio.channels.ClosedChannelException: null
        at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source) ~[?:?]
[2018-05-10T15:26:18,260][INFO ][o.e.x.w.WatcherService   ] [dummy_coordinator_node_name] stopping watch service, reason [no master node]
[2018-05-10T15:26:18,258][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [dummy_coordinator_node_name] fatal error in thread [Thread-4], exiting
java.lang.OutOfMemoryError: Java heap space
        at com.fasterxml.jackson.core.util.BufferRecycler.balloc(BufferRecycler.java:155) ~[jackson-core-2.8.10.jar:2.8.10]
        at com.fasterxml.jackson.core.util.BufferRecycler.allocByteBuffer(BufferRecycler.java:96) ~
        
....

        at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
        at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) ~[?:?]
        at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111) ~[?:?]

As you can see, the memory spike happens very abruptly. The first spike below did not result in an OOM as far as I know but the second one did:

I was going to post some heap dump analysis screenshots cos the node was supposed to create a heap dump but unfortunately I cannot find any heap dumps - maybe the JVM failed to create one.

Cheers,

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.