Clarification in Elasticsearch OUTOFMEMORY warning?

Hi,

I got OUTOFMEMEORY exception in my ES cluster which has 3 nodes in it and i had used load balancer on these nodes. I got this error in NODE 0 at 11.24 UTC time .The error is like this

    [2017-06-20 11:24:59,217][WARN ][cluster.action.shard     ] [NODE-0] [data-jun-2017][1] received shard failed for target shard [[data-jun-2017][1], node[Maxx2y7qR7CX8JzoOtTCgA], [R], v[18], s[STARTED], a[id=sVU6Jew3TZuAX7zD2qVzfA]], indexUUID [YRBQDO1zTqCFMlfMW5UnFg], message [failed to perform indices:data/write/bulk[s] on replica on node {NODE-0}{Maxx2y7qR7CX8JzoOtTCgA}{10.0.0.X}{10.0.0.X:9300}{max_local_storage_nodes=1, master=true}], failure [RemoteTransportException[[NODE-0][10.0.0.X:9300][indices:data/write/bulk[s][r]]]; nested: OutOfMemoryError[Java heap space]; ]
    RemoteTransportException[[NODE-0][10.0.0.X:9300][indices:data/write/bulk[s][r]]]; nested: OutOfMemoryError[Java heap space];
    Caused by: java.lang.OutOfMemoryError: Java heap space
            at org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:95)
            at org.elasticsearch.common.io.stream.StreamInput.readBytesReference(StreamInput.java:84)
            at org.elasticsearch.action.index.IndexRequest.readFrom(IndexRequest.java:697)
            at org.elasticsearch.action.bulk.BulkItemRequest.readFrom(BulkItemRequest.java:104)
            at org.elasticsearch.action.bulk.BulkItemRequest.readBulkItem(BulkItemRequest.java:89)
            at org.elasticsearch.action.bulk.BulkShardRequest.readFrom(BulkShardRequest.java:89)
            at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:222)
            at org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:116)
            at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
            at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
            at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
            at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
            at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
            at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
            at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
            at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
            at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
            at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
            at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:75)
            at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
            at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
            at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
            at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
            at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
            at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
            at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
            at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
            at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
            at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
            at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

And also i checked the status in marvel for NODE 0 and it was like this:

What does it mean whether it is handling the requests or not ?

When i had seen the other nodes status (i.e.NODE 1,2) . They were fine

Questions:

  1. I got this error in NODE 0 so all the requests coming to NODE 0 will fail . What i want to know whether the failed requests will be given to NODE 1,2 since there are in good state.
  2. Because we enabled load balancing whether the rejected requests will go the ones which are in good state?

Thanks

The thing about OOME (OutOfMemoryError) issues is that they can put the node in a strange state, causing it to sometimes "seem" to recover, when in reality it has not.

So for your questions, if the requests have already failed, they will not be retried on different nodes. And for the load balancer, it depends on how your load balancer checks "liveness" of a node to know whether requests will be sent only to good nodes.

For 5.0+, we changed ES to completely stop when an OOME occurs, since we don't want to get into a strange state on the node - https://github.com/elastic/elasticsearch/pull/19272

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.