ES 2.3.3 - Cannot understand why OOM

I've a single node installation of ES with version 2.3.3 and I get an OOM. But I cannot understand why this OOM occurs. For now the server is only doing indexing (no search in progress).

Can someone give my a clue to understand ?

As we can see in the logfile, we have a lot of GC. I do bulk load by batch of 5Mb why the heap is using 29Gb !!!

Here is the log (I've many other CG before these) : 

    [2017-03-08 12:02:24,974][WARN ][monitor.jvm              ] [NGELK1] [gc][old][84676][159] duration [38.4s], collections [3]/[38.5s], total [38.4s]/[30.4m], memory [29.1gb]->[29.1gb]/[29.1gb], all_pools {[young] [865.3mb]->[865.3mb]/[865.3mb]}{[survivor] [106.6mb]->[106.1mb]/[108.1mb]}{[old] [28.2gb]->[28.2gb]/[28.2gb]}
    [2017-03-08 12:03:12,946][WARN ][monitor.jvm              ] [NGELK1] [gc][old][84677][162] duration [34.1s], collections [3]/[34.3s], total [34.1s]/[31m], memory [29.1gb]->[29.1gb]/[29.1gb], all_pools {[young] [865.3mb]->[865.3mb]/[865.3mb]}{[survivor] [106.1mb]->[108mb]/[108.1mb]}{[old] [28.2gb]->[28.2gb]/[28.2gb]}
    [2017-03-08 12:48:40,642][WARN ][transport.netty          ] [NGELK1] exception caught on transport layer [[id: 0x175ecf80, /127.0.0.1:60584 :> /127.0.0.1:9300]], closing connection
    java.lang.OutOfMemoryError: Java heap space
    	at org.jboss.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:33)
    	at org.jboss.netty.channel.DefaultChannelPipeline.execute(DefaultChannelPipeline.java:636)
    	at org.jboss.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496)
    	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:445)
    	at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
    	at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:99)
    	at org.jboss.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36)
    	at org.jboss.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)
    	at org.jboss.netty.channel.Channels.write(Channels.java:704)
    	at org.jboss.netty.channel.Channels.write(Channels.java:671)
    	at org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:348)
    	at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:103)
    	at org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:75)
    	at org.elasticsearch.transport.DelegatingTransportChannel.sendResponse(DelegatingTransportChannel.java:58)
    	at org.elasticsearch.transport.RequestHandlerRegistry$TransportChannelWrapper.sendResponse(RequestHandlerRegistry.java:134)
    	at org.elasticsearch.action.support.HandledTransportAction$TransportHandler$1.onResponse(HandledTransportAction.java:65)
    	at org.elasticsearch.action.support.HandledTransportAction$TransportHandler$1.onResponse(HandledTransportAction.java:61)
    	at org.elasticsearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:89)
    	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    [2017-03-08 12:48:40,774][WARN ][index.engine             ] [NGELK1] [ngt-liveserver-20161214][0] failed engine [lucene commit failed]
    java.lang.OutOfMemoryError: Java heap space
    [2017-03-08 12:48:40,778][WARN ][index.shard              ] [NGELK1] [ngt-liveserver-20161214][0] failed to refresh after decreasing index buffer
    [ngt-liveserver-20161214][[ngt-liveserver-20161214][0]] EngineClosedException[CurrentState[CLOSED] Closed]
    	at org.elasticsearch.index.engine.Engine.ensureOpen(Engine.java:329)
    	at org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:674)
    	at org.elasticsearch.index.shard.IndexShard.refresh(IndexShard.java:661)
    	at org.elasticsearch.index.shard.IndexShard.updateBufferSize(IndexShard.java:1155)
    	at org.elasticsearch.index.shard.IndexShard.checkIdle(IndexShard.java:1183)
    	at org.elasticsearch.indices.memory.IndexingMemoryController.checkIdle(IndexingMemoryController.java:302)
    	at org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:254)
    	at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:640)
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    [2017-03-08 12:48:40,849][WARN ][indices.cluster          ] [NGELK1] [[ngt-liveserver-20161214][0]] marking and sending shard failed due to [engine failure, reason [lucene commit failed]]
    java.lang.OutOfMemoryError: Java heap space
    [2017-03-08 12:48:40,825][WARN ][index.translog           ] [NGELK1] [ngt-liveserver-20161214][0] failed to flush shard on translog threshold
    [ngt-liveserver-20161214][[ngt-liveserver-20161214][0]] FlushFailedEngineException[Flush failed]; nested: OutOfMemoryError[Java heap space];
    	at org.elasticsearch.index.engine.InternalEngine.flush(InternalEngine.java:765)
    	at org.elasticsearch.index.shard.IndexShard.flush(IndexShard.java:782)
    	at org.elasticsearch.index.translog.TranslogService$TranslogBasedFlush$1.doRun(TranslogService.java:222)
    	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.OutOfMemoryError: Java heap space
    [2017-03-08 12:48:40,937][WARN ][cluster.action.shard     ] [NGELK1] [ngt-liveserver-20161214][0] received shard failed for target shard [[ngt-liveserver-20161214][0], node[3AdkqZSIRsODeBvxSOJXfQ], [P], v[2], s[STARTED], a[id=RgbNorZTT32rz3fiVIsASg]], indexUUID [xCcQt5q5Qficvhz-FDGz_A], message [engine failure, reason [lucene commit failed]], failure [OutOfMemoryError[Java heap space]]
    java.lang.OutOfMemoryError: Java heap space

What is your use case? Do you have a lot of fields with analysed text that uses up heap or are you primarily using doc_values? How many indices and shards do you have?

My use case is to index many events with limited number of fields (~80) to be able to analyze them later in kibana.

All string fields are not_analyzed.

For now, I've about 2billions of documents in around 200 indices (1 shard per index).

Do you have any custom config settings in your elasticsearch.yml file? What is your heap size?

I've set ES_HEAP_SIZE="30000m"

And the machine has 8 cores, 64gb RAM and 3TB storage

All other settings are the default.

Do you have any custom config settings in your elasticsearch.yml file?

No, only:
bootstrap.mlockall: true and cluster and node name.

But, in my indexing process I do the following :

  • Start to read a logfile
  • Create index for that logfile
  • Set refresh to -1
  • Index all event for the logfile
  • Set refresh to 30s
  • Do a forceMerge with 1 segment on this index
  • Process next logfile

The problem was due to a dashboard display in kibana that needs aggregations over a large amount of data.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.