Hi,
We have a 11 node ES cluster in AWS cloud, version ES 6.6.2, which is performing intensive bulk ingestion with very minimum search queries. PFB node configuration
Node Type Instance type JVM configured
1 Master, Data, Ingest i3.2xlarge 30g
2 Master, Data, Ingest i3.2xlarge 30g
3 Master, Data, Ingest i3.2xlarge 30g
4 Data, Ingest i3.2xlarge 30g
5 Data, Ingest i3.2xlarge 30g
6 Data, Ingest i3.2xlarge 30g
7 Data, Ingest i3.2xlarge 30g
8 Data, Ingest i3.2xlarge 30g
9 Data, Ingest i3.2xlarge 30g
10 Data, Ingest i3.2xlarge 30g
11 Cordinator m5.large 5g
Elasticsearch service only in the co-ordinator node is failing frequently with Java Heap space out of memory issue as mentioned in the below logs
[2019-12-23T23:56:01,923][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [ip-10-13-39-98.ec2.internal] fatal error in thread [Thread-5], exiting
java.lang.OutOfMemoryError: Java heap space
at io.netty.buffer.UnpooledHeapByteBuf.allocateArray(UnpooledHeapByteBuf.java:88) ~[?:?]
at io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledHeapByteBuf.allocateArray(UnpooledByteBufAllocator.java:164) ~[?:?]
at io.netty.buffer.UnpooledHeapByteBuf.<init>(UnpooledHeapByteBuf.java:61) ~[?:?]
at io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledHeapByteBuf.<init>(UnpooledByteBufAllocator.java:159) ~[?:?]
at io.netty.buffer.UnpooledByteBufAllocator.newHeapBuffer(UnpooledByteBufAllocator.java:82) ~[?:?]
at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:166) ~[?:?]
at io.netty.buffer.AbstractByteBufAllocator.heapBuffer(AbstractByteBufAllocator.java:157) ~[?:?]
at io.netty.buffer.Unpooled.buffer(Unpooled.java:116) ~[?:?]
at io.netty.buffer.Unpooled.copiedBuffer(Unpooled.java:409) ~[?:?]
at org.elasticsearch.http.netty4.Netty4HttpRequestHandler.channelRead0(Netty4HttpRequestHandler.java:73) ~[?:?]
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
at org.elasticsearch.http.netty4.pipelining.HttpPipeliningHandler.channelRead(HttpPipeliningHandler.java:68) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) ~[?:?]
at io.netty.handler.codec.MessageToMessageCodec.channelRead(MessageToMessageCodec.java:111) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[?:?]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[?:?]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) ~[?:?]
PFB the heap dump analysis
Could you help us identifying what can be the root cause of the issue and how can this be mitigated. We cannot compromise on the ingestion rate. Initially the JVM heap configure for the cordinator node was 4g and we have increased it to 5g and it is not solving the problem.
Please let me know if you need more details