OOM in Elasticsearch 1.6


(Chenryn) #1

I use Elasticsearch for log search. And already has doc_values settings for a long while.
But my cluster begin OOM after I upgrade to ES 1.6, even delete some shard:

[2015-07-23 01:32:06,096][WARN ][index.merge.scheduler    ] [10.19.0.43] [logstash-mweibo-2015.07.22][24] failed to merge
org.apache.lucene.store.AlreadyClosedException: refusing to delete any files: this IndexWriter hit an unrecoverable exception
        at org.apache.lucene.index.IndexFileDeleter.ensureOpen(IndexFileDeleter.java:354)
        at org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:719)
        at org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:451)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3826)
        at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
        at org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:107)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)
Caused by: java.lang.OutOfMemoryError: Java heap space

...

[2015-07-23 01:37:03,324][WARN ][index.engine             ] [10.19.0.43] [logstash-mweibo-2015.07.22][24] failed engine [out of memory]
java.lang.OutOfMemoryError: Java heap space
[2015-07-23 01:37:03,327][WARN ][action.bulk              ] [10.19.0.43] Failed to send response for indices:data/write/bulk[s]
java.lang.OutOfMemoryError: Java heap space
[2015-07-23 01:37:03,327][WARN ][action.bulk              ] [10.19.0.43] Failed to send response for indices:data/write/bulk[s]
java.lang.OutOfMemoryError: Java heap space
[2015-07-23 01:37:03,343][WARN ][indices.cluster          ] [10.19.0.43] [[logstash-mweibo-2015.07.22][24]] marking and sending shard failed due to [engine failure, reason [out of memory]]
java.lang.OutOfMemoryError: Java heap space

...

[2015-07-23 01:37:11,337][WARN ][transport.netty          ] [10.19.0.43] Actual Exception
org.elasticsearch.index.IndexShardMissingException: [logstash-mweibo-2015.07.22][24] missing
        at org.elasticsearch.index.IndexService.shardSafe(IndexService.java:210)
        at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:138)
        at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:56)
        at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:338)
        at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:324)
        at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:279)
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

And the first OOM message is my es.log as follow:

[2015-07-23 01:27:39,042][WARN ][indices.ttl              ] [10.19.0.43] failed to execute ttl purge
java.lang.OutOfMemoryError: Java heap space
[2015-07-23 01:27:45,672][WARN ][transport.netty          ] [10.19.0.43] exception caught on transport layer [[id: 0x4de40b8a, /10.19.0.68:38330 => /10.19.0.43:9300]], closi
ng connection
java.lang.OutOfMemoryError: Java heap space
[2015-07-23 01:27:42,044][WARN ][transport.netty          ] [10.19.0.43] exception caught on transport layer [[id: 0xc84f6826, /10.19.0.81:41486 => /10.19.0.43:9300]], closi
ng connection
java.lang.OutOfMemoryError: Java heap space
[2015-07-23 01:27:42,042][WARN ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
[2015-07-23 01:27:39,042][WARN ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
[2015-07-23 01:28:13,656][WARN ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
[2015-07-23 01:28:13,655][WARN ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
[2015-07-23 01:28:13,655][WARN ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space
[2015-07-23 01:28:10,786][WARN ][netty.channel.socket.nio.AbstractNioSelector] Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: Java heap space

What's these messages mean? seems I didn't see something like these before ES 1.6.


(Mark Walkom) #2

How much data do you have in your cluster, how many nodes, what is the heap size?


(Chenryn) #3

30TB in 26 datanodes with each 25 GB heap.


(system) #4