OOM on Cold Cluster Start

David_Kleiner · November 14, 2013, 9:04pm

Hi,

With the latest stable ES, I'm getting OOM on cold cluster start with heap
size under 25-30G. I had to find a really beefy box to get the cluster up
and running and then bind two more 10G heap ES nodes to it.

Once the cluster is operational, heap pressure stays under 10G. I have
2-way cluster, with a single data-less gateway, 40 indices (mostly user
logs fed by logstash, split by month), 392 total shards (both nodes),
about 220G total space, 110G/node. I kept the default 5-shards / index.

Recovery on cold start was really painful and took hours of downtime until
I found a big temporary node.

Any recommendations to avoid this situation on the next cold start
appreciated!

Thank you,

David

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_Kleiner · November 14, 2013, 9:15pm

stack trace on cold start (single node, unbound to the cluster):

[2013-11-14 15:12:47,626][WARN ][index.engine.robin ] [Typeface]
[eventlog-2013.11][0] failed to prepare/warm
java.lang.OutOfMemoryError: Java heap space
at
org.elasticsearch.search.SearchService$IndexReaderWarmer.warm(SearchService.java:649)
at
org.elasticsearch.indices.warmer.InternalIndicesWarmer.warm(InternalIndicesWarmer.java:90)
at
org.elasticsearch.index.engine.robin.RobinEngine$RobinSearchFactory.newSearcher(RobinEngine.java:1622)
at
org.apache.lucene.search.SearcherManager.getSearcher(SearcherManager.java:155)
at
org.apache.lucene.search.SearcherManager.(SearcherManager.java:89)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildSearchManager(RobinEngine.java:1505)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:280)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:660)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:201)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2013-11-14 15:12:50,555][WARN ][cluster.action.shard ] [Typeface]
[nginx-2013.11][1] sending failed shard for [nginx-2013.11][1],
node[nc9mQX0vQzyWWINV0l8t9Q], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to create shard, message
[IndexShardCreationException[[nginx-2013.11][1] failed to create shard];
nested: ExecutionError[java.lang.OutOfMemoryError: Java heap space];
nested: OutOfMemoryError[Java heap space]; ]]
[2013-11-14 15:12:50,556][WARN ][cluster.action.shard ] [Typeface]
[nginx-2013.11][1] received shard failed for [nginx-2013.11][1],
node[nc9mQX0vQzyWWINV0l8t9Q], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to create shard, message
[IndexShardCreationException[[nginx-2013.11][1] failed to create shard];
nested: ExecutionError[java.lang.OutOfMemoryError: Java heap space];
nested: OutOfMemoryError[Java heap space]; ]]

On Thursday, November 14, 2013 1:04:23 PM UTC-8, David Kleiner wrote:

Hi,

With the latest stable ES, I'm getting OOM on cold cluster start with heap
size under 25-30G. I had to find a really beefy box to get the cluster up
and running and then bind two more 10G heap ES nodes to it.

Once the cluster is operational, heap pressure stays under 10G. I have
2-way cluster, with a single data-less gateway, 40 indices (mostly user
logs fed by logstash, split by month), 392 total shards (both nodes),
about 220G total space, 110G/node. I kept the default 5-shards / index.

Recovery on cold start was really painful and took hours of downtime until
I found a big temporary node.

Any recommendations to avoid this situation on the next cold start
appreciated!

Thank you,

David

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

spinscale · November 14, 2013, 9:26pm

Hey,

are you using elasticsearch 0.90.6 or 0.90.7?
Elasticsearch 0.90.7 was released yesterday to fix an issue, which might
cause OOMs in that particular setup, see the release blog post at

--Alex

On Thu, Nov 14, 2013 at 10:15 PM, David Kleiner david.kleiner@gmail.comwrote:

stack trace on cold start (single node, unbound to the cluster):

[2013-11-14 15:12:47,626][WARN ][index.engine.robin ] [Typeface]
[eventlog-2013.11][0] failed to prepare/warm
java.lang.OutOfMemoryError: Java heap space
at
org.elasticsearch.search.SearchService$IndexReaderWarmer.warm(SearchService.java:649)
at
org.elasticsearch.indices.warmer.InternalIndicesWarmer.warm(InternalIndicesWarmer.java:90)
at
org.elasticsearch.index.engine.robin.RobinEngine$RobinSearchFactory.newSearcher(RobinEngine.java:1622)
at
org.apache.lucene.search.SearcherManager.getSearcher(SearcherManager.java:155)
at
org.apache.lucene.search.SearcherManager.(SearcherManager.java:89)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildSearchManager(RobinEngine.java:1505)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:280)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:660)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:201)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2013-11-14 15:12:50,555][WARN ][cluster.action.shard ] [Typeface]
[nginx-2013.11][1] sending failed shard for [nginx-2013.11][1],
node[nc9mQX0vQzyWWINV0l8t9Q], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to create shard, message
[IndexShardCreationException[[nginx-2013.11][1] failed to create shard];
nested: ExecutionError[java.lang.OutOfMemoryError: Java heap space];
nested: OutOfMemoryError[Java heap space]; ]]
[2013-11-14 15:12:50,556][WARN ][cluster.action.shard ] [Typeface]
[nginx-2013.11][1] received shard failed for [nginx-2013.11][1],
node[nc9mQX0vQzyWWINV0l8t9Q], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to create shard, message
[IndexShardCreationException[[nginx-2013.11][1] failed to create shard];
nested: ExecutionError[java.lang.OutOfMemoryError: Java heap space];
nested: OutOfMemoryError[Java heap space]; ]]

On Thursday, November 14, 2013 1:04:23 PM UTC-8, David Kleiner wrote:

Hi,

With the latest stable ES, I'm getting OOM on cold cluster start with
heap size under 25-30G. I had to find a really beefy box to get the
cluster up and running and then bind two more 10G heap ES nodes to it.

Once the cluster is operational, heap pressure stays under 10G. I have
2-way cluster, with a single data-less gateway, 40 indices (mostly user
logs fed by logstash, split by month), 392 total shards (both nodes),
about 220G total space, 110G/node. I kept the default 5-shards / index.

Recovery on cold start was really painful and took hours of downtime
until I found a big temporary node.

Any recommendations to avoid this situation on the next cold start
appreciated!

Thank you,

David

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_Kleiner · November 14, 2013, 10:01pm

That was 0.90.6, I'll give 0.90.7 a try, thank you Alex!

On Thursday, November 14, 2013 1:26:59 PM UTC-8, Alexander Reelsen wrote:

Hey,

are you using elasticsearch 0.90.6 or 0.90.7?
Elasticsearch 0.90.7 was released yesterday to fix an issue, which might
cause OOMs in that particular setup, see the release blog post at
Elasticsearch Platform — Find real-time answers at scale | Elastic

--Alex

On Thu, Nov 14, 2013 at 10:15 PM, David Kleiner <david....@gmail.com<javascript:>

wrote:

stack trace on cold start (single node, unbound to the cluster):

[2013-11-14 15:12:47,626][WARN ][index.engine.robin ] [Typeface]
[eventlog-2013.11][0] failed to prepare/warm
java.lang.OutOfMemoryError: Java heap space
at
org.elasticsearch.search.SearchService$IndexReaderWarmer.warm(SearchService.java:649)
at
org.elasticsearch.indices.warmer.InternalIndicesWarmer.warm(InternalIndicesWarmer.java:90)
at
org.elasticsearch.index.engine.robin.RobinEngine$RobinSearchFactory.newSearcher(RobinEngine.java:1622)
at
org.apache.lucene.search.SearcherManager.getSearcher(SearcherManager.java:155)
at
org.apache.lucene.search.SearcherManager.(SearcherManager.java:89)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildSearchManager(RobinEngine.java:1505)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:280)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:660)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:201)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2013-11-14 15:12:50,555][WARN ][cluster.action.shard ] [Typeface]
[nginx-2013.11][1] sending failed shard for [nginx-2013.11][1],
node[nc9mQX0vQzyWWINV0l8t9Q], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to create shard, message
[IndexShardCreationException[[nginx-2013.11][1] failed to create shard];
nested: ExecutionError[java.lang.OutOfMemoryError: Java heap space];
nested: OutOfMemoryError[Java heap space]; ]]
[2013-11-14 15:12:50,556][WARN ][cluster.action.shard ] [Typeface]
[nginx-2013.11][1] received shard failed for [nginx-2013.11][1],
node[nc9mQX0vQzyWWINV0l8t9Q], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to create shard, message
[IndexShardCreationException[[nginx-2013.11][1] failed to create shard];
nested: ExecutionError[java.lang.OutOfMemoryError: Java heap space];
nested: OutOfMemoryError[Java heap space]; ]]

On Thursday, November 14, 2013 1:04:23 PM UTC-8, David Kleiner wrote:

Hi,

With the latest stable ES, I'm getting OOM on cold cluster start with
heap size under 25-30G. I had to find a really beefy box to get the
cluster up and running and then bind two more 10G heap ES nodes to it.

Once the cluster is operational, heap pressure stays under 10G. I have
2-way cluster, with a single data-less gateway, 40 indices (mostly user
logs fed by logstash, split by month), 392 total shards (both nodes),
about 220G total space, 110G/node. I kept the default 5-shards / index.

Recovery on cold start was really painful and took hours of downtime
until I found a big temporary node.

Any recommendations to avoid this situation on the next cold start
appreciated!

Thank you,

David

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

David_Kleiner · November 14, 2013, 10:24pm

Happy to report, with 0.90.7 the nodes came back up fast and with no heap
issues. Guess I can bring the heap size down from 12G to 8G to give more
room the the logstash instances.

Cheers!

David

On Thursday, November 14, 2013 2:01:56 PM UTC-8, David Kleiner wrote:

That was 0.90.6, I'll give 0.90.7 a try, thank you Alex!

On Thursday, November 14, 2013 1:26:59 PM UTC-8, Alexander Reelsen wrote:

Hey,

are you using elasticsearch 0.90.6 or 0.90.7?
Elasticsearch 0.90.7 was released yesterday to fix an issue, which might
cause OOMs in that particular setup, see the release blog post at
Elasticsearch Platform — Find real-time answers at scale | Elastic

--Alex

On Thu, Nov 14, 2013 at 10:15 PM, David Kleiner david....@gmail.comwrote:

stack trace on cold start (single node, unbound to the cluster):

[2013-11-14 15:12:47,626][WARN ][index.engine.robin ] [Typeface]
[eventlog-2013.11][0] failed to prepare/warm
java.lang.OutOfMemoryError: Java heap space
at
org.elasticsearch.search.SearchService$IndexReaderWarmer.warm(SearchService.java:649)
at
org.elasticsearch.indices.warmer.InternalIndicesWarmer.warm(InternalIndicesWarmer.java:90)
at
org.elasticsearch.index.engine.robin.RobinEngine$RobinSearchFactory.newSearcher(RobinEngine.java:1622)
at
org.apache.lucene.search.SearcherManager.getSearcher(SearcherManager.java:155)
at
org.apache.lucene.search.SearcherManager.(SearcherManager.java:89)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildSearchManager(RobinEngine.java:1505)
at
org.elasticsearch.index.engine.robin.RobinEngine.start(RobinEngine.java:280)
at
org.elasticsearch.index.shard.service.InternalIndexShard.performRecoveryPrepareForTranslog(InternalIndexShard.java:660)
at
org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:201)
at
org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:174)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2013-11-14 15:12:50,555][WARN ][cluster.action.shard ] [Typeface]
[nginx-2013.11][1] sending failed shard for [nginx-2013.11][1],
node[nc9mQX0vQzyWWINV0l8t9Q], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to create shard, message
[IndexShardCreationException[[nginx-2013.11][1] failed to create shard];
nested: ExecutionError[java.lang.OutOfMemoryError: Java heap space];
nested: OutOfMemoryError[Java heap space]; ]]
[2013-11-14 15:12:50,556][WARN ][cluster.action.shard ] [Typeface]
[nginx-2013.11][1] received shard failed for [nginx-2013.11][1],
node[nc9mQX0vQzyWWINV0l8t9Q], [P], s[INITIALIZING], indexUUID [na],
reason [Failed to create shard, message
[IndexShardCreationException[[nginx-2013.11][1] failed to create shard];
nested: ExecutionError[java.lang.OutOfMemoryError: Java heap space];
nested: OutOfMemoryError[Java heap space]; ]]

On Thursday, November 14, 2013 1:04:23 PM UTC-8, David Kleiner wrote:

Hi,

With the latest stable ES, I'm getting OOM on cold cluster start with
heap size under 25-30G. I had to find a really beefy box to get the
cluster up and running and then bind two more 10G heap ES nodes to it.

Once the cluster is operational, heap pressure stays under 10G. I have
2-way cluster, with a single data-less gateway, 40 indices (mostly user
logs fed by logstash, split by month), 392 total shards (both nodes),
about 220G total space, 110G/node. I kept the default 5-shards / index.

Recovery on cold start was really painful and took hours of downtime
until I found a big temporary node.

Any recommendations to avoid this situation on the next cold start
appreciated!

Thank you,

David

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
OOM Java heapspace on ES1.1.1 cluster Elasticsearch	2	438	April 13, 2017
Aggregate query: Elasticsearch:java.lang.OutOfMemoryError: Java heap space Elasticsearch	8	1447	July 25, 2019
1.0.0.Beta2 OOM logs Elasticsearch	4	345	July 6, 2017
Garbage collection not kicking in - Heap is growing to 98% Elasticsearch	3	930	June 29, 2017
Merging of segments results in java.lang.OutOfMemoryError: Java heap space Elasticsearch	28	15280	July 5, 2017

OOM on Cold Cluster Start

Related topics