Update to ES 1.4.2 gone terribly wrong - nodes won't start

Jean_Noel_Rivasseau · December 26, 2014, 8:52pm

Hello,

We had a 2 nodes, 10 shards cluster with about 150GB of data (not sure how
many documents, I would say about 150 millions since 1 doc is about 1K for
us). It worked fine under ES 1.3.4.

We tried to update it today to 1.4.2. Now the nodes won't start. They eat
up all RAM during the recovery phase (when booting, and starting engines on
all indices), then GC occurs when no more RAM is available for ES. ES then
eventually gets stuck with OOM errors.

I tried to increase the RAM given to ES to 30GB and it still eats all that
up and fails. Before with 1.3.4 we had 16GB allocated and no problems.

Why ES needs all that RAM? It's doing nothing, not servicing a single
request... it's only starting the recovery process...

We now have our production cluster down. It's really a huge problem for us,
please advise on any solutions or things we could try out.

Java is IcedTea 7.2.5.3
OS is Gentoo Linux

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jean_Noel_Rivasseau · December 27, 2014, 9:34am

To be a bit more precise, we have 375GB of data. Still, I am under the
impression that 1.4.2 uses at least 3 times more RAM than 1.3.4. Everything
was fine under 1.3.4.

On Friday, December 26, 2014 11:52:48 PM UTC+3, Jean-Noël Rivasseau wrote:

Hello,

We had a 2 nodes, 10 shards cluster with about 150GB of data (not sure how
many documents, I would say about 150 millions since 1 doc is about 1K for
us). It worked fine under ES 1.3.4.

We tried to update it today to 1.4.2. Now the nodes won't start. They eat
up all RAM during the recovery phase (when booting, and starting engines on
all indices), then GC occurs when no more RAM is available for ES. ES then
eventually gets stuck with OOM errors.

I tried to increase the RAM given to ES to 30GB and it still eats all that
up and fails. Before with 1.3.4 we had 16GB allocated and no problems.

Why ES needs all that RAM? It's doing nothing, not servicing a single
request... it's only starting the recovery process...

We now have our production cluster down. It's really a huge problem for
us, please advise on any solutions or things we could try out.

Java is IcedTea 7.2.5.3
OS is Gentoo Linux

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8a769d4b-65b7-4809-93a9-86b36e64e459%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · December 27, 2014, 12:02pm

Check if you had flushed the translog of the indexes on 1.3.4 before
migration. Replaying the translog may add substantial amount of RAM usage.

Check if you had set index.codec.bloom.load=true, and if so, set it to
false.

Try Oracle JDK.

Try to add a third node.

Jörg

On Sat, Dec 27, 2014 at 10:34 AM, Jean-Noël Rivasseau elvanor@gmail.com
wrote:

To be a bit more precise, we have 375GB of data. Still, I am under the
impression that 1.4.2 uses at least 3 times more RAM than 1.3.4. Everything
was fine under 1.3.4.

On Friday, December 26, 2014 11:52:48 PM UTC+3, Jean-Noël Rivasseau wrote:

Hello,

We had a 2 nodes, 10 shards cluster with about 150GB of data (not sure
how many documents, I would say about 150 millions since 1 doc is about 1K
for us). It worked fine under ES 1.3.4.

We tried to update it today to 1.4.2. Now the nodes won't start. They eat
up all RAM during the recovery phase (when booting, and starting engines on
all indices), then GC occurs when no more RAM is available for ES. ES then
eventually gets stuck with OOM errors.

I tried to increase the RAM given to ES to 30GB and it still eats all
that up and fails. Before with 1.3.4 we had 16GB allocated and no problems.

Why ES needs all that RAM? It's doing nothing, not servicing a single
request... it's only starting the recovery process...

We now have our production cluster down. It's really a huge problem for
us, please advise on any solutions or things we could try out.

Java is IcedTea 7.2.5.3
OS is Gentoo Linux

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8a769d4b-65b7-4809-93a9-86b36e64e459%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8a769d4b-65b7-4809-93a9-86b36e64e459%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFEdN84YqbCao%3DLkWuF0S11wnrstM1VtY9mRant%2BLS4wQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · December 27, 2014, 1:33pm

IcedTea isn't a JVM version. Give us java -version. It looks like that
version of IcedTea could be OpenJDK 7u71 which is generally fine (we use it
under plenty of loaf). It could also be jamvm or cacao or zero/shark. Those
probably won't work. Lots of folks suggest oraclejdk so you may as well try
like Joe says.

You can try using jmap to get a heap histogram when memory is filling up
and posting that somewhere.

You may be able to downgrade. The worst that can happen is some indexes can
open then you have to delete and rebuild them from source. This is only
likely to work because you can't get it to boot.

Also, running gentoo in production is either brave or crazy or both.

Nik
On Dec 26, 2014 3:52 PM, "Jean-Noël Rivasseau" elvanor@gmail.com wrote:

Hello,

We had a 2 nodes, 10 shards cluster with about 150GB of data (not sure how
many documents, I would say about 150 millions since 1 doc is about 1K for
us). It worked fine under ES 1.3.4.

We tried to update it today to 1.4.2. Now the nodes won't start. They eat
up all RAM during the recovery phase (when booting, and starting engines on
all indices), then GC occurs when no more RAM is available for ES. ES then
eventually gets stuck with OOM errors.

I tried to increase the RAM given to ES to 30GB and it still eats all that
up and fails. Before with 1.3.4 we had 16GB allocated and no problems.

Why ES needs all that RAM? It's doing nothing, not servicing a single
request... it's only starting the recovery process...

We now have our production cluster down. It's really a huge problem for
us, please advise on any solutions or things we could try out.

Java is IcedTea 7.2.5.3
OS is Gentoo Linux

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3pYSC5qOxcmQ97PBw7fnBYqFT%2BQqUD4kv-r6dRS2Zubw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jean_Noel_Rivasseau · December 27, 2014, 1:37pm

We were being affected by

github.com/elastic/elasticsearch

Large index no longer initialises under 1.4.0 and 1.4.0 Beta 1 due to OutOfMemoryException

opened 05:51PM - 07 Nov 14 UTC

closed 04:56AM - 28 Feb 15 UTC

andrassy

feedback_needed

We have one particularly large index in our cluster - it contains 10s of million…s of documents and has quite a lot of nesteds too. Prior to 1.4.0 Beta 1 (including 1.2.x and 1.3.x) the index re-initialised on a node with 8GB allocated to ElasticSearch (16GB+ available in OS). Since 1.4.0 Beta 1 (and still on 1.4.0) we're getting an OOM exception (startup log and exception stack below). At this point, the node ceases recovery (expected, I guess) and becomes unresponsive. All data nodes suffer the same fate and the entire cluster becomes unresponsive. ``` [2014-11-07 17:12:39,895][WARN ][common.jna ] unable to link C library. native methods (mlockall) will be disabled. [2014-11-07 17:12:40,077][INFO ][node ] [dvlp_FRONTEND2] version[1.4.0], pid[9052], build[bc94bd8/2014-11-05T14:26:12Z] [2014-11-07 17:12:40,077][INFO ][node ] [dvlp_FRONTEND2] initializing ... [2014-11-07 17:12:40,129][INFO ][plugins ] [dvlp_FRONTEND2] loaded [cloud-aws], sites [bigdesk, head, inquisitor, kopf] [2014-11-07 17:12:45,220][INFO ][node ] [dvlp_FRONTEND2] initialized [2014-11-07 17:12:45,220][INFO ][node ] [dvlp_FRONTEND2] starting ... [2014-11-07 17:12:45,438][INFO ][transport ] [dvlp_FRONTEND2] bound_address {inet[/0:0:0:0:0:0:0:0:50882]}, publish_address {inet[FRONTEND2/192.168.10.73:50882]} [2014-11-07 17:12:45,452][INFO ][discovery ] [dvlp_FRONTEND2] dvlp/C2f-euXcRc-cEv3dnsBnXw [2014-11-07 17:13:15,451][WARN ][discovery ] [dvlp_FRONTEND2] waited for 30s and no initial state was set by the discovery [2014-11-07 17:13:15,468][INFO ][http ] [dvlp_FRONTEND2] bound_address {inet[/0:0:0:0:0:0:0:0:50881]}, publish_address {inet[frontend2/192.168.10.73:50881]} [2014-11-07 17:13:15,468][INFO ][node ] [dvlp_FRONTEND2] started [2014-11-07 17:13:48,552][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]] [2014-11-07 17:14:51,597][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]] [2014-11-07 17:15:54,633][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]] [2014-11-07 17:16:57,647][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]] [2014-11-07 17:18:00,664][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]] [2014-11-07 17:19:03,675][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]] [2014-11-07 17:20:06,684][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [ElasticsearchTimeoutException[Timeout waiting for task.]] [2014-11-07 17:20:36,950][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][jwhGk5NyTx-E1HInKTLDkg][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [NodeDisconnectedException[[dvlp_FRONTEND2_coordinator][inet[/192.168.10.73:55591]][internal:discovery/zen/join] disconnected]] [2014-11-07 17:20:41,171][WARN ][transport.netty ] [dvlp_FRONTEND2] Message not fully read (response) for [85] handler future(org.elasticsearch.transport.EmptyTransportResponseHandler@2060e2c8), error [true], resetting [2014-11-07 17:20:41,171][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND1_coordinator][4y8Hh5kAQPK2Ie3gzc58Ww][FRONTEND1][inet[/192.168.10.70:55858]]{datacentrename=site1, data=false, nodename=dvlp_FRONTEND1_coordinator, master=true}], reason [RemoteTransportException[Failed to deserialize exception response from stream]; nested: TransportSerializationException[Failed to deserialize exception response from stream]; nested: StreamCorruptedException[unexpected end of block data]; ] [2014-11-07 17:20:45,520][INFO ][discovery.zen ] [dvlp_FRONTEND2] failed to send join request to master [[dvlp_FRONTEND2_coordinator][-O87CxU3RRSTHZkuC985Yw][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}], reason [RemoteTransportException[[dvlp_FRONTEND2_coordinator][inet[/192.168.10.73:55591]][internal:discovery/zen/join]]; nested: ElasticsearchIllegalStateException[Node [[dvlp_FRONTEND2_coordinator][-O87CxU3RRSTHZkuC985Yw][FRONTEND2][inet[FRONTEND2/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}] not master for join request from [[dvlp_FRONTEND2][C2f-euXcRc-cEv3dnsBnXw][FRONTEND2][inet[/192.168.10.73:50882]]{datacentrename=site2, nodename=dvlp_FRONTEND2, master=false}]]; ], tried [3] times [2014-11-07 17:20:48,831][INFO ][cluster.service ] [dvlp_FRONTEND2] detected_master [dvlp_FRONTEND2_coordinator][-O87CxU3RRSTHZkuC985Yw][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}, added {[dvlp_DEVBATH01.exabre.co.uk_loadbalancer][8i4izXAUQiWeS2arwV9LeA][DEVBATH01][inet[/192.168.10.65:12184]]{datacentrename=site1, data=false, nodename=dvlp_DEVBATH01.exabre.co.uk_loadbalancer, master=true},[dvlp_FRONTEND2_coordinator][-O87CxU3RRSTHZkuC985Yw][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true},[dvlp_FRONTEND2_loadbalancer][joVXc_fGTx-SC_YwJ2YBmQ][FRONTEND2][inet[/192.168.10.73:65341]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_loadbalancer, master=false},[dvlp_FRONTEND1_loadbalancer][snDHwo0YTR6VsAFV9nBcxw][FRONTEND1][inet[/192.168.10.70:55054]]{datacentrename=site1, data=false, nodename=dvlp_FRONTEND1_loadbalancer, master=false},}, reason: zen-disco-receive(from master [[dvlp_FRONTEND2_coordinator][-O87CxU3RRSTHZkuC985Yw][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}]) [2014-11-07 17:21:01,937][INFO ][cluster.service ] [dvlp_FRONTEND2] added {[dvlp_FRONTEND1_coordinator][4y8Hh5kAQPK2Ie3gzc58Ww][FRONTEND1][inet[/192.168.10.70:55858]]{datacentrename=site1, data=false, nodename=dvlp_FRONTEND1_coordinator, master=true},}, reason: zen-disco-receive(from master [[dvlp_FRONTEND2_coordinator][-O87CxU3RRSTHZkuC985Yw][FRONTEND2][inet[/192.168.10.73:55591]]{datacentrename=site2, data=false, nodename=dvlp_FRONTEND2_coordinator, master=true}]) [2014-11-07 17:25:25,598][INFO ][monitor.jvm ] [dvlp_FRONTEND2] [gc][old][739][27] duration [8s], collections [1]/[9s], total [8s]/[8.8s], memory [7.8gb]->[7.7gb]/[7.9gb], all_pools {[young] [172.4mb]->[46.5mb]/[199.6mb]}{[survivor] [24.9mb]->[0b]/[24.9mb]}{[old] [7.6gb]->[7.7gb]/[7.7gb]} [2014-11-07 17:25:46,387][INFO ][monitor.jvm ] [dvlp_FRONTEND2] [gc][old][746][32] duration [5s], collections [1]/[6s], total [5s]/[23.6s], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [195mb]->[199.6mb]/[199.6mb]}{[survivor] [0b]->[10.9mb]/[24.9mb]}{[old] [7.7gb]->[7.7gb]/[7.7gb]} [2014-11-07 17:28:16,136][WARN ][index.warmer ] [dvlp_FRONTEND2] [dvlp_13_67_item_20140410][7] failed to load fixed bitset for [org.elasticsearch.index.search.nested.NonNestedDocsFilter@fd00879d] org.elasticsearch.common.util.concurrent.ExecutionError: java.lang.OutOfMemoryError: Java heap space at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2201) at org.elasticsearch.common.cache.LocalCache.get(LocalCache.java:3937) at org.elasticsearch.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739) at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache.getAndLoadIfNotPresent(FixedBitSetFilterCache.java:139) at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache.access$100(FixedBitSetFilterCache.java:75) at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$FixedBitSetFilterWarmer$1.run(FixedBitSetFilterCache.java:287) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.FixedBitSet.<init>(FixedBitSet.java:187) at org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104) at org.elasticsearch.common.lucene.search.NotFilter.getDocIdSet(NotFilter.java:49) at org.elasticsearch.index.search.nested.NonNestedDocsFilter.getDocIdSet(NonNestedDocsFilter.java:46) at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$2.call(FixedBitSetFilterCache.java:142) at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$2.call(FixedBitSetFilterCache.java:139) at org.elasticsearch.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742) at org.elasticsearch.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527) at org.elasticsearch.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319) at org.elasticsearch.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282) at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2197) ... 8 more [2014-11-07 17:28:29,215][INFO ][monitor.jvm ] [dvlp_FRONTEND2] [gc][old][749][40] duration [22.9s], collections [4]/[2.3m], total [22.9s]/[1m], memory [7.9gb]->[7.9gb]/[7.9gb], all_pools {[young] [199.5mb]->[199.6mb]/[199.6mb]}{[survivor] [22.9mb]->[23.1mb]/[24.9mb]}{[old] [7.7gb]->[7.7gb]/[7.7gb]} [2014-11-07 17:28:23,797][WARN ][index.warmer ] [dvlp_FRONTEND2] [dvlp_13_67_item_20140410][7] failed to load fixed bitset for [org.elasticsearch.index.search.nested.NestedDocsFilter@fd00879d] org.elasticsearch.common.util.concurrent.ExecutionError: java.lang.OutOfMemoryError: Java heap space at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2201) at org.elasticsearch.common.cache.LocalCache.get(LocalCache.java:3937) at org.elasticsearch.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4739) at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache.getAndLoadIfNotPresent(FixedBitSetFilterCache.java:139) at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache.access$100(FixedBitSetFilterCache.java:75) at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$FixedBitSetFilterWarmer$1.run(FixedBitSetFilterCache.java:287) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.FixedBitSet.<init>(FixedBitSet.java:187) at org.apache.lucene.search.MultiTermQueryWrapperFilter.getDocIdSet(MultiTermQueryWrapperFilter.java:104) at org.elasticsearch.index.search.nested.NestedDocsFilter.getDocIdSet(NestedDocsFilter.java:50) at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$2.call(FixedBitSetFilterCache.java:142) at org.elasticsearch.index.cache.fixedbitset.FixedBitSetFilterCache$2.call(FixedBitSetFilterCache.java:139) at org.elasticsearch.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4742) at org.elasticsearch.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3527) at org.elasticsearch.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2319) at org.elasticsearch.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2282) at org.elasticsearch.common.cache.LocalCache$Segment.get(LocalCache.java:2197) ... 8 more ```

Setting index.load_fixed_bitset_filters_eagerly to false fixed everything
for now.

I could argue that not running Gentoo in production is crazy, but it really
depends on your personal preferences

On Saturday, December 27, 2014 4:34:02 PM UTC+3, Nikolas Everett wrote:

IcedTea isn't a JVM version. Give us java -version. It looks like that
version of IcedTea could be OpenJDK 7u71 which is generally fine (we use it
under plenty of loaf). It could also be jamvm or cacao or zero/shark. Those
probably won't work. Lots of folks suggest oraclejdk so you may as well try
like Joe says.

You can try using jmap to get a heap histogram when memory is filling up
and posting that somewhere.

You may be able to downgrade. The worst that can happen is some indexes
can open then you have to delete and rebuild them from source. This is only
likely to work because you can't get it to boot.

Also, running gentoo in production is either brave or crazy or both.

Nik
On Dec 26, 2014 3:52 PM, "Jean-Noël Rivasseau" <elv...@gmail.com
<javascript:>> wrote:

Hello,

We had a 2 nodes, 10 shards cluster with about 150GB of data (not sure
how many documents, I would say about 150 millions since 1 doc is about 1K
for us). It worked fine under ES 1.3.4.

We tried to update it today to 1.4.2. Now the nodes won't start. They eat
up all RAM during the recovery phase (when booting, and starting engines on
all indices), then GC occurs when no more RAM is available for ES. ES then
eventually gets stuck with OOM errors.

I tried to increase the RAM given to ES to 30GB and it still eats all
that up and fails. Before with 1.3.4 we had 16GB allocated and no problems.

Why ES needs all that RAM? It's doing nothing, not servicing a single
request... it's only starting the recovery process...

We now have our production cluster down. It's really a huge problem for
us, please advise on any solutions or things we could try out.

Java is IcedTea 7.2.5.3
OS is Gentoo Linux

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c3e4a74c-cd54-455b-ade2-dee980d1b457%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · December 27, 2014, 2:12pm

Just my 2¢ - after battling with gentoo, you will come back to RHEL/Centos.

http://www.reddit.com/r/linux/comments/1xizt0/production_server_environment_gentoo_or_debian/

Jörg

On Sat, Dec 27, 2014 at 2:37 PM, Jean-Noël Rivasseau elvanor@gmail.com
wrote:

I could argue that not running Gentoo in production is crazy, but it
really depends on your personal preferences

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGpVWMXXMBReyajYM%2BDZ1qs_Pu0dA1SO-wXwNLvBr%2BEhQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · December 27, 2014, 2:35pm

Yikes! I've seen a few people hit by that but I keep forgetting about it.
I have some kind of cognitive dissonance for nested docs.
On Dec 27, 2014 8:37 AM, "Jean-Noël Rivasseau" elvanor@gmail.com wrote:

We were being affected by

Large index no longer initialises under 1.4.0 and 1.4.0 Beta 1 due to OutOfMemoryException · Issue #8394 · elastic/elasticsearch · GitHub

Setting index.load_fixed_bitset_filters_eagerly to false fixed
everything for now.

I could argue that not running Gentoo in production is crazy, but it
really depends on your personal preferences

On Saturday, December 27, 2014 4:34:02 PM UTC+3, Nikolas Everett wrote:

IcedTea isn't a JVM version. Give us java -version. It looks like
that version of IcedTea could be OpenJDK 7u71 which is generally fine (we
use it under plenty of loaf). It could also be jamvm or cacao or
zero/shark. Those probably won't work. Lots of folks suggest oraclejdk so
you may as well try like Joe says.

You can try using jmap to get a heap histogram when memory is filling up
and posting that somewhere.

You may be able to downgrade. The worst that can happen is some indexes
can open then you have to delete and rebuild them from source. This is only
likely to work because you can't get it to boot.

Also, running gentoo in production is either brave or crazy or both.

Nik
On Dec 26, 2014 3:52 PM, "Jean-Noël Rivasseau" elv...@gmail.com wrote:

Hello,

We had a 2 nodes, 10 shards cluster with about 150GB of data (not sure
how many documents, I would say about 150 millions since 1 doc is about 1K
for us). It worked fine under ES 1.3.4.

We tried to update it today to 1.4.2. Now the nodes won't start. They
eat up all RAM during the recovery phase (when booting, and starting
engines on all indices), then GC occurs when no more RAM is available for
ES. ES then eventually gets stuck with OOM errors.

I tried to increase the RAM given to ES to 30GB and it still eats all
that up and fails. Before with 1.3.4 we had 16GB allocated and no problems.

Why ES needs all that RAM? It's doing nothing, not servicing a single
request... it's only starting the recovery process...

We now have our production cluster down. It's really a huge problem for
us, please advise on any solutions or things we could try out.

Java is IcedTea 7.2.5.3
OS is Gentoo Linux

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c3e4a74c-cd54-455b-ade2-dee980d1b457%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c3e4a74c-cd54-455b-ade2-dee980d1b457%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0f45RkB89w8uaWtPQnd-KXpvj%2BjG66dZh422Jxm2ifng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
ES eating all memory despite JVM startup configuration Elasticsearch	8	879	July 5, 2017
Newbie - memory issues Elasticsearch	3	442	July 6, 2017
Lack of memory? Elasticsearch	11	806	July 6, 2017
Out of memory at startup with large index and parent/child relation Elasticsearch	2	573	July 6, 2017
Java OOM error on a 64G - 8 Core CPU server Elasticsearch	5	920	July 6, 2017

Update to ES 1.4.2 gone terribly wrong - nodes won't start

Related topics