For a couple of months I have been using ES 6.4.2 and my client nodes get reaped/oom-killed by the linux kernel, due to which I do not get a heap dump. On taking a force heap dump I get the below dominator tree.
After reducing the heap usage to 7GB, i have not seen the OOM killing yet.
Can i get help with getting an explanation for why the killing occurred and stopped? I would like to be more informed as i see that in case of load increase there have been suggestions to increase heap size.
Can you share the entirety of the kernel messages that were emitted when the OOM killer did its thing? They're normally accessible using dmesg as long as it wasn't too far in the past.
Do you have any nonstandard kernel settings regarding memory allocation?
One possibility is that there is something else on the same machine consuming memory. The OOM killer picks one process to terminate, and is often going to pick Elasticsearch even when Elasticsearch is behaving properly.
However, in the manual page on setting the heap size it's recommended not to exceed 50% of the physical RAM in the machine. The reason given there is to allow space for the page cache, but another reason is that I think with a 9GB heap some JVMs may allocate up to 9GB of direct (i.e. off-heap) buffers in addition to the heap, and the total would exceed your available physical RAM. Reducing the heap size to 7GB will also reduce the default limit on the direct buffers to 7GB, keeping the total under 15GB.
Thanks; I'd expect a few more messages before this too, but this looks like it implicates the memory usage of direct buffers:
The JVM had over 13GB resident; there's a handful of other processes consuming memory too, so I think it all got too close to 15GB. Since the machine can't support Elasticsearch taking 13GB of memory then you might even want to consider setting the heap size lower than 7GB.
I have attached the other half as well. Reducing heap size would affect query speed etc. right? The same configuration worked very well in es2. So something takes up more memory in es6?
Thanks, I think that also points towards Elasticsearch just using more memory than you've got.
It might slow things down but it might also speed things up. You can only really tell by experiment.
Seems so. I count over 25,000 changes between versions 2.4.6 and 6.4.2, some of which are quite dramatic. I wouldn't want to speculate on which of them are a factor here.
So i turned down the heap size to 1G and tried it out for a week and the clients crashed today. Had the following stack trace:
[2019-05-03T16:58:12,238][ERROR][o.e.t.n.Netty4Utils ] fatal error on the network layer
at org.elasticsearch.transport.netty4.Netty4Utils.maybeDie(Netty4Utils.java:182)
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:73)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at java.lang.Thread.run(Thread.java:748)
[2019-05-03T16:58:12,276][ERROR][o.e.t.n.Netty4Utils ] fatal error on the network layer
at org.elasticsearch.transport.netty4.Netty4Utils.maybeDie(Netty4Utils.java:182)
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:73)
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at java.lang.Thread.run(Thread.java:748)
[2019-05-03T16:58:12,291][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [metabase-es6-client-us1-6] fatal error in thread [Thread-4], exiting
java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.common.util.PageCacheRecycler$1.newInstance(PageCacheRecycler.java:99) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.PageCacheRecycler$1.newInstance(PageCacheRecycler.java:96) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.recycler.DequeRecycler.obtain(DequeRecycler.java:53) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.recycler.AbstractRecycler.obtain(AbstractRecycler.java:33) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.recycler.DequeRecycler.obtain(DequeRecycler.java:28) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.recycler.FilterRecycler.obtain(FilterRecycler.java:39) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.recycler.Recyclers$3.obtain(Recyclers.java:119) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.recycler.FilterRecycler.obtain(FilterRecycler.java:39) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.PageCacheRecycler.bytePage(PageCacheRecycler.java:147) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.AbstractBigArray.newBytePage(AbstractBigArray.java:117) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.BigByteArray.resize(BigByteArray.java:143) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.BigArrays.resizeInPlace(BigArrays.java:448) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.BigArrays.resize(BigArrays.java:495) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.BigArrays.grow(BigArrays.java:512) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.io.stream.BytesStreamOutput.ensureCapacity(BytesStreamOutput.java:157) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.io.stream.ReleasableBytesStreamOutput.ensureCapacity(ReleasableBytesStreamOutput.java:69) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.io.stream.BytesStreamOutput.writeBytes(BytesStreamOutput.java:89) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.CompressibleBytesOutputStream.writeBytes(CompressibleBytesOutputStream.java:85) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.io.stream.StreamOutput.write(StreamOutput.java:459) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.bytes.BytesReference.writeTo(BytesReference.java:86) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.io.stream.StreamOutput.writeBytesReference(StreamOutput.java:203) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.search.SearchHit.writeTo(SearchHit.java:801) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.search.SearchHits.writeTo(SearchHits.java:213) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.search.internal.InternalSearchResponse.writeTo(InternalSearchResponse.java:63) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.action.search.SearchResponse.writeTo(SearchResponse.java:385) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpTransport.buildMessage(TcpTransport.java:1280) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpTransport.sendResponse(TcpTransport.java:1233) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpTransport.sendResponse(TcpTransport.java:1207) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpTransportChannel.sendResponse(TcpTransportChannel.java:66) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpTransportChannel.sendResponse(TcpTransportChannel.java:60) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.action.support.HandledTransportAction$TransportHandler$1.onResponse(HandledTransportAction.java:83) ~[elasticsearch-6.4.2.jar:6.4.2]
[2019-05-03T16:58:12,291][WARN ][o.e.x.s.t.n.SecurityNetty4ServerTransport] [metabase-es6-client-us1-6] exception caught on transport layer [NettyTcpChannel{localAddress=/172.18.42.2:45568, remoteAddress=metabase-es-184--hhac.int.us1.signalfx.com/10.7.66.196:9300}], closing connection
org.elasticsearch.ElasticsearchException: java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.transport.netty4.Netty4Transport.exceptionCaught(Netty4Transport.java:237) [transport-netty4-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.exceptionCaught(Netty4MessageChannelHandler.java:74) [transport-netty4-6.4.2.jar:6.4.2]
at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310) [netty-codec-4.1.16.Final.jar:4.1.16.Final]
at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.16.Final.jar:4.1.16.Final]
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:413) [netty-codec-4.1.16.Final.jar:4.1.16.Final]
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) [netty-codec-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.16.Final.jar:4.1.16.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.common.util.PageCacheRecycler$1.newInstance(PageCacheRecycler.java:99) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.PageCacheRecycler$1.newInstance(PageCacheRecycler.java:96) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.recycler.DequeRecycler.obtain(DequeRecycler.java:53) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.recycler.AbstractRecycler.obtain(AbstractRecycler.java:33) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.recycler.DequeRecycler.obtain(DequeRecycler.java:28) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.recycler.FilterRecycler.obtain(FilterRecycler.java:39) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.recycler.Recyclers$3.obtain(Recyclers.java:119) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.recycler.FilterRecycler.obtain(FilterRecycler.java:39) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.PageCacheRecycler.bytePage(PageCacheRecycler.java:147) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.AbstractBigArray.newBytePage(AbstractBigArray.java:117) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.BigByteArray.resize(BigByteArray.java:143) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.BigArrays.resizeInPlace(BigArrays.java:448) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.BigArrays.resize(BigArrays.java:495) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.BigArrays.grow(BigArrays.java:512) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.io.stream.BytesStreamOutput.ensureCapacity(BytesStreamOutput.java:157) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.io.stream.ReleasableBytesStreamOutput.ensureCapacity(ReleasableBytesStreamOutput.java:69) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.io.stream.BytesStreamOutput.writeBytes(BytesStreamOutput.java:89) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.CompressibleBytesOutputStream.writeBytes(CompressibleBytesOutputStream.java:85) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.io.stream.StreamOutput.write(StreamOutput.java:459) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.bytes.BytesReference.writeTo(BytesReference.java:86) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.io.stream.StreamOutput.writeBytesReference(StreamOutput.java:203) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.search.SearchHit.writeTo(SearchHit.java:801) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.search.SearchHits.writeTo(SearchHits.java:213) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.search.internal.InternalSearchResponse.writeTo(InternalSearchResponse.java:63) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.action.search.SearchResponse.writeTo(SearchResponse.java:385) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpTransport.buildMessage(TcpTransport.java:1280) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpTransport.sendResponse(TcpTransport.java:1233) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpTransport.sendResponse(TcpTransport.java:1207) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpTransportChannel.sendResponse(TcpTransportChannel.java:66) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TcpTransportChannel.sendResponse(TcpTransportChannel.java:60) ~[elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:54)
Also in the heap dumps we see incoming references as below. Since we do not use any of the xpack features could that also explain the case for OOM reaper killing?
I think you are hitting an OutOfMemoryError because you have dramatically reduced the heap space available to this node. At the start of this thread you were comparing the behaviour with 7GB vs 9GB of heap, but you now say you have reduced it all the way to 1GB. It is not clear why you have done this, but I am not surprised that such a small node can no longer keep up with the workload.
Note that this is different behaviour from the original problem. It's no longer the kernel OOM killer that's shutting this node down, which means that you are not running out of physical RAM here.
@DavidTurner thanks such quick responses. Turning down the heap to 1G dramatically, is totally my fault. I was trying to understand the workload. The faster gc and heap trend seemed okay, until it wasn't.
I am trying to understand the sweet spot for the setting for our workload. A total noob question, is there a way/metric that i can use from elasticsearch to monitor workload?
Since the client/coordinating node does both distributing bulk indexing and scatter-gather of searches, what metrics would help me gain more insight quickly?
Interesting. Was there a sudden shift in workload? When you hit OOM you should get a heap dump which might be worth investigating.
It might also be worth upgrading to 7.0 since it's quite a bit more resilient to heap pressure, preferring to reject requests instead of just crashing like this.
I am still investigating. The only explanation i have now is that it looked like the queries started asking for more results at that time compared to historical data.
Hence need guidance on metrics to look at as stated above. Repasting here
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.