Hi,
I am experiencing OutOfMemoryErrors on nodes in our ElasticSearch cluster
once the collective size of our indexes grows to a certain point.
We are running the following software:
ES 0.19.10
CentOS release 6.2
Oracle JVM 1.6.0_33
JVM settings:
wrapper.java.additional.3=-Xss256k
wrapper.java.additional.4=-XX:+UseParNewGC
wrapper.java.additional.5=-XX:+UseConcMarkSweepGC
wrapper.java.additional.6=-XX:CMSInitiatingOccupancyFraction=75
wrapper.java.additional.7=-XX:+UseCMSInitiatingOccupancyOnly
Our cluster consists of 6 nodes with the following configuration:
24GB system memory
12GB JVM heap
Currently we are rolling weekly indexes with the following settings:
- index.cache.field.expire: 10m
- index.refresh_interval: 60s
- index.number_of_replicas: 1
- index.cache.field.max_size: 50000
- index.number_of_shards: 5
- index.routing.allocation.total_shards_per_node: 2
- index.cache.field.type: soft
We index ~100-300 docs/sec. In one week our index sizes are ~95M documents
with overall index sizes around 1.3 GB. We are currently using templates to
control index settings when new indexes are created. We had accumulated 3
indexes and were in the process of rolling to the fourth when the OOMEs
happened. By the time we caught the problem and did a cluster restart both
the new index and one we had rolled from had shards which were corrupt and
could not be allocated on the cluster.
My questions are as follows
-
Are there JVM/kernel settings that could help prevent OOMEs such as this
by perhaps being more aggressive at garbage collection? -
Are there index or cluster settings that would help prevent corruption of
shards in this situation? -
Is there any way to reduce the overhead of rolling to a new index?
I would also add that we have negligable query load - our field cache sizes
are 200-600mb in general. And we are currently trying to procure more
memory for the cluster. Log entries from the crash are below.
-drew
[2012-11-11 16:23:46,627][WARN ][monitor.jvm ] [esn-05]
[gc][ParNew][263760][51242] duration [2s], collections [1]/[2.2s], total
[2s]/[27.5m], memory [11.5gb]->[11.4gb]/[11.9gb], all_pools {[Code Cache]
[9.6mb]->[9.6mb]/[48mb]}{[Par Eden Space]
[173.3mb]->[773.7kb]/[216.3mb]}{[Par Survivor Space]
[27mb]->[27mb]/[27mb]}{[CMS Old Gen] [11.3gb]->[11.4gb]/[11.7gb]}{[CMS Perm
Gen] [47.3mb]->[47.3mb]/[82mb]}
[2012-11-11 17:53:20,090][WARN ][monitor.jvm ] [esn-05]
[gc][ParNew][269123][52347] duration [1.9s], collections [1]/[2.8s], total
[1.9s]/[28.2m], memory [11.6gb]->[11.6gb]/[11.9gb], all_pools {[Code Cache]
[9.6mb]->[9.6mb]/[48mb]}{[Par Eden Space] [48.1mb]->[7.5mb]/[216.3mb]}{[Par
Survivor Space] [27mb]->[26.9mb]/[27mb]}{[CMS Old Gen]
[11.5gb]->[11.5gb]/[11.7gb]}{[CMS Perm Gen] [47.5mb]->[47.5mb]/[82mb]}
[2012-11-11 18:09:06,259][WARN ][transport.netty ] [esn-05]
exception caught on netty layer [[id: 0xecbcec60, /10.8.2.46:35446 =>
/10.8.2.50:9300]]
java.lang.OutOfMemoryError: Java heap space
at
org.elasticsearch.common.compress.BufferRecycler.allocDecodeBuffer(BufferRecycler.java:137)
at
org.elasticsearch.common.compress.lzf.LZFCompressedStreamInput.(LZFCompressedStreamInput.java:46)
at
org.elasticsearch.common.compress.lzf.LZFCompressor.streamInput(LZFCompressor.java:128)
at
org.elasticsearch.common.io.stream.CachedStreamInput.cachedHandlesCompressed(CachedStreamInput.java:70)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:105)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:458)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:439)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:75)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:793)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:565)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:560)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:84)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.processSelectedKeys(AbstractNioWorker.java:471)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:332)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:35)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:102)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-11 18:09:23,835][WARN ][transport.netty ] [esn-05]
exception caught on netty layer [[id: 0xf49f4a4e, /10.8.2.50:58466 =>
/10.8.2.50:9300]]
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at
org.elasticsearch.common.io.stream.BytesStreamOutput.writeBytes(BytesStreamOutput.java:88)
at
org.elasticsearch.common.io.stream.StreamOutput.write(StreamOutput.java:252)
at
org.elasticsearch.common.compress.lzf.ChunkEncoder.encodeAndWriteChunk(ChunkEncoder.java:157)
at
org.elasticsearch.common.compress.lzf.LZFCompressedStreamOutput.compress(LZFCompressedStreamOutput.java:52)
at
org.elasticsearch.common.compress.CompressedStreamOutput.flushBuffer(CompressedStreamOutput.java:125)
at
org.elasticsearch.common.compress.CompressedStreamOutput.writeBytes(CompressedStreamOutput.java:80)
at
org.elasticsearch.common.io.stream.StreamOutput.write(StreamOutput.java:252)
at
org.elasticsearch.common.bytes.BytesArray.writeTo(BytesArray.java:83)
at
org.elasticsearch.common.io.stream.StreamOutput.writeBytesReference(StreamOutput.java:94)
at
org.elasticsearch.common.io.stream.AdapterStreamOutput.writeBytesReference(AdapterStreamOutput.java:98)
[2012-11-11 18:14:01,018][DEBUG][action.admin.indices.stats] [esn-03]
[messages_20121105][2], node[idI32JxCRKCzeQQ16ps0IA], [P], s[STARTED]:
Failed to execute
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@1f23a32f]
org.elasticsearch.transport.RemoteTransportException:
[esn-05][inet[/10.8.2.50:9300]][indices/stats/s]
Caused by: org.elasticsearch.index.IndexShardMissingException:
[messages_20121105][2] missing
at
org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:179)
at
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
at
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:398)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:211)
[0/0]
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$1.run(TransportBroadcastOperationAction.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-11 21:55:23,328][DEBUG][action.admin.indices.stats] [esn-03]
[messages_20121022][3], node[JUmuWHmITJyyOiCbqTmnjQ], [P], s[STARTED]:
Failed to execute
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@2a1ebae5]
org.elasticsearch.index.IndexShardMissingException: [messages_20121022][3]
missing
at
org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:179)
at
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
at
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:234)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:211)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$1.run(TransportBroadcastOperationAction.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-11 21:55:33,328][DEBUG][action.admin.indices.stats] [esn-03]
[messages_20121022][3], node[JUmuWHmITJyyOiCbqTmnjQ], [P], s[STARTED]:
Failed to execute
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@245a7bd6]
org.elasticsearch.index.IndexShardMissingException: [messages_20121022][3]
missing
at
org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:179)
at
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
at
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:234)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:211)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$1.run(TransportBroadcastOperationAction.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-11 21:55:43,357][DEBUG][action.admin.indices.stats] [esn-03]
[messages_20121022][3], node[JUmuWHmITJyyOiCbqTmnjQ], [P], s[STARTED]:
Failed to execute
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@5dfce18]
org.elasticsearch.index.IndexShardMissingException: [messages_20121022][3]
missing
at
org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:179)
at
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
at
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:234)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:211)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$1.run(TransportBroadcastOperationAction.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-11 21:55:43,388][DEBUG][action.admin.indices.stats] [esn-03]
[messages_20121022][3], node[JUmuWHmITJyyOiCbqTmnjQ], [P], s[STARTED]:
Failed to execute
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@28538cab]
org.elasticsearch.index.IndexShardMissingException: [messages_20121022][3]
missing
at
org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:179)
at
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
at
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:234)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:211)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$1.run(TransportBroadcastOperationAction.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
[2012-11-11 21:55:53,328][DEBUG][action.admin.indices.stats] [esn-03]
[messages_20121022][3], node[JUmuWHmITJyyOiCbqTmnjQ], [P], s[STARTED]:
Failed to execute
[org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@5081b244]
org.elasticsearch.index.IndexShardMissingException: [messages_20121022][3]
missing
at
org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:179)
at
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
at
org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:234)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:211)
--