Update to ES 1.4.2 gone terribly wrong - nodes won't start

Hello,

We had a 2 nodes, 10 shards cluster with about 150GB of data (not sure how
many documents, I would say about 150 millions since 1 doc is about 1K for
us). It worked fine under ES 1.3.4.

We tried to update it today to 1.4.2. Now the nodes won't start. They eat
up all RAM during the recovery phase (when booting, and starting engines on
all indices), then GC occurs when no more RAM is available for ES. ES then
eventually gets stuck with OOM errors.

I tried to increase the RAM given to ES to 30GB and it still eats all that
up and fails. Before with 1.3.4 we had 16GB allocated and no problems.

Why ES needs all that RAM? It's doing nothing, not servicing a single
request... it's only starting the recovery process...

We now have our production cluster down. It's really a huge problem for us,
please advise on any solutions or things we could try out.

Java is IcedTea 7.2.5.3
OS is Gentoo Linux

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

To be a bit more precise, we have 375GB of data. Still, I am under the
impression that 1.4.2 uses at least 3 times more RAM than 1.3.4. Everything
was fine under 1.3.4.

On Friday, December 26, 2014 11:52:48 PM UTC+3, Jean-Noël Rivasseau wrote:

Hello,

We had a 2 nodes, 10 shards cluster with about 150GB of data (not sure how
many documents, I would say about 150 millions since 1 doc is about 1K for
us). It worked fine under ES 1.3.4.

We tried to update it today to 1.4.2. Now the nodes won't start. They eat
up all RAM during the recovery phase (when booting, and starting engines on
all indices), then GC occurs when no more RAM is available for ES. ES then
eventually gets stuck with OOM errors.

I tried to increase the RAM given to ES to 30GB and it still eats all that
up and fails. Before with 1.3.4 we had 16GB allocated and no problems.

Why ES needs all that RAM? It's doing nothing, not servicing a single
request... it's only starting the recovery process...

We now have our production cluster down. It's really a huge problem for
us, please advise on any solutions or things we could try out.

Java is IcedTea 7.2.5.3
OS is Gentoo Linux

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8a769d4b-65b7-4809-93a9-86b36e64e459%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Check if you had flushed the translog of the indexes on 1.3.4 before
migration. Replaying the translog may add substantial amount of RAM usage.

Check if you had set index.codec.bloom.load=true, and if so, set it to
false.

Try Oracle JDK.

Try to add a third node.

Jörg

On Sat, Dec 27, 2014 at 10:34 AM, Jean-Noël Rivasseau elvanor@gmail.com
wrote:

To be a bit more precise, we have 375GB of data. Still, I am under the
impression that 1.4.2 uses at least 3 times more RAM than 1.3.4. Everything
was fine under 1.3.4.

On Friday, December 26, 2014 11:52:48 PM UTC+3, Jean-Noël Rivasseau wrote:

Hello,

We had a 2 nodes, 10 shards cluster with about 150GB of data (not sure
how many documents, I would say about 150 millions since 1 doc is about 1K
for us). It worked fine under ES 1.3.4.

We tried to update it today to 1.4.2. Now the nodes won't start. They eat
up all RAM during the recovery phase (when booting, and starting engines on
all indices), then GC occurs when no more RAM is available for ES. ES then
eventually gets stuck with OOM errors.

I tried to increase the RAM given to ES to 30GB and it still eats all
that up and fails. Before with 1.3.4 we had 16GB allocated and no problems.

Why ES needs all that RAM? It's doing nothing, not servicing a single
request... it's only starting the recovery process...

We now have our production cluster down. It's really a huge problem for
us, please advise on any solutions or things we could try out.

Java is IcedTea 7.2.5.3
OS is Gentoo Linux

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8a769d4b-65b7-4809-93a9-86b36e64e459%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8a769d4b-65b7-4809-93a9-86b36e64e459%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFEdN84YqbCao%3DLkWuF0S11wnrstM1VtY9mRant%2BLS4wQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

IcedTea isn't a JVM version. Give us java -version. It looks like that
version of IcedTea could be OpenJDK 7u71 which is generally fine (we use it
under plenty of loaf). It could also be jamvm or cacao or zero/shark. Those
probably won't work. Lots of folks suggest oraclejdk so you may as well try
like Joe says.

You can try using jmap to get a heap histogram when memory is filling up
and posting that somewhere.

You may be able to downgrade. The worst that can happen is some indexes can
open then you have to delete and rebuild them from source. This is only
likely to work because you can't get it to boot.

Also, running gentoo in production is either brave or crazy or both.

Nik
On Dec 26, 2014 3:52 PM, "Jean-Noël Rivasseau" elvanor@gmail.com wrote:

Hello,

We had a 2 nodes, 10 shards cluster with about 150GB of data (not sure how
many documents, I would say about 150 millions since 1 doc is about 1K for
us). It worked fine under ES 1.3.4.

We tried to update it today to 1.4.2. Now the nodes won't start. They eat
up all RAM during the recovery phase (when booting, and starting engines on
all indices), then GC occurs when no more RAM is available for ES. ES then
eventually gets stuck with OOM errors.

I tried to increase the RAM given to ES to 30GB and it still eats all that
up and fails. Before with 1.3.4 we had 16GB allocated and no problems.

Why ES needs all that RAM? It's doing nothing, not servicing a single
request... it's only starting the recovery process...

We now have our production cluster down. It's really a huge problem for
us, please advise on any solutions or things we could try out.

Java is IcedTea 7.2.5.3
OS is Gentoo Linux

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3pYSC5qOxcmQ97PBw7fnBYqFT%2BQqUD4kv-r6dRS2Zubw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

We were being affected by

Setting index.load_fixed_bitset_filters_eagerly to false fixed everything
for now.

I could argue that not running Gentoo in production is crazy, but it really
depends on your personal preferences :slight_smile:

On Saturday, December 27, 2014 4:34:02 PM UTC+3, Nikolas Everett wrote:

IcedTea isn't a JVM version. Give us java -version. It looks like that
version of IcedTea could be OpenJDK 7u71 which is generally fine (we use it
under plenty of loaf). It could also be jamvm or cacao or zero/shark. Those
probably won't work. Lots of folks suggest oraclejdk so you may as well try
like Joe says.

You can try using jmap to get a heap histogram when memory is filling up
and posting that somewhere.

You may be able to downgrade. The worst that can happen is some indexes
can open then you have to delete and rebuild them from source. This is only
likely to work because you can't get it to boot.

Also, running gentoo in production is either brave or crazy or both.

Nik
On Dec 26, 2014 3:52 PM, "Jean-Noël Rivasseau" <elv...@gmail.com
<javascript:>> wrote:

Hello,

We had a 2 nodes, 10 shards cluster with about 150GB of data (not sure
how many documents, I would say about 150 millions since 1 doc is about 1K
for us). It worked fine under ES 1.3.4.

We tried to update it today to 1.4.2. Now the nodes won't start. They eat
up all RAM during the recovery phase (when booting, and starting engines on
all indices), then GC occurs when no more RAM is available for ES. ES then
eventually gets stuck with OOM errors.

I tried to increase the RAM given to ES to 30GB and it still eats all
that up and fails. Before with 1.3.4 we had 16GB allocated and no problems.

Why ES needs all that RAM? It's doing nothing, not servicing a single
request... it's only starting the recovery process...

We now have our production cluster down. It's really a huge problem for
us, please advise on any solutions or things we could try out.

Java is IcedTea 7.2.5.3
OS is Gentoo Linux

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c3e4a74c-cd54-455b-ade2-dee980d1b457%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Just my 2¢ - after battling with gentoo, you will come back to RHEL/Centos.

http://www.reddit.com/r/linux/comments/1xizt0/production_server_environment_gentoo_or_debian/

Jörg

On Sat, Dec 27, 2014 at 2:37 PM, Jean-Noël Rivasseau elvanor@gmail.com
wrote:

I could argue that not running Gentoo in production is crazy, but it
really depends on your personal preferences :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGpVWMXXMBReyajYM%2BDZ1qs_Pu0dA1SO-wXwNLvBr%2BEhQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Yikes! I've seen a few people hit by that but I keep forgetting about it.
I have some kind of cognitive dissonance for nested docs.
On Dec 27, 2014 8:37 AM, "Jean-Noël Rivasseau" elvanor@gmail.com wrote:

We were being affected by

Large index no longer initialises under 1.4.0 and 1.4.0 Beta 1 due to OutOfMemoryException · Issue #8394 · elastic/elasticsearch · GitHub

Setting index.load_fixed_bitset_filters_eagerly to false fixed
everything for now.

I could argue that not running Gentoo in production is crazy, but it
really depends on your personal preferences :slight_smile:

On Saturday, December 27, 2014 4:34:02 PM UTC+3, Nikolas Everett wrote:

IcedTea isn't a JVM version. Give us java -version. It looks like
that version of IcedTea could be OpenJDK 7u71 which is generally fine (we
use it under plenty of loaf). It could also be jamvm or cacao or
zero/shark. Those probably won't work. Lots of folks suggest oraclejdk so
you may as well try like Joe says.

You can try using jmap to get a heap histogram when memory is filling up
and posting that somewhere.

You may be able to downgrade. The worst that can happen is some indexes
can open then you have to delete and rebuild them from source. This is only
likely to work because you can't get it to boot.

Also, running gentoo in production is either brave or crazy or both.

Nik
On Dec 26, 2014 3:52 PM, "Jean-Noël Rivasseau" elv...@gmail.com wrote:

Hello,

We had a 2 nodes, 10 shards cluster with about 150GB of data (not sure
how many documents, I would say about 150 millions since 1 doc is about 1K
for us). It worked fine under ES 1.3.4.

We tried to update it today to 1.4.2. Now the nodes won't start. They
eat up all RAM during the recovery phase (when booting, and starting
engines on all indices), then GC occurs when no more RAM is available for
ES. ES then eventually gets stuck with OOM errors.

I tried to increase the RAM given to ES to 30GB and it still eats all
that up and fails. Before with 1.3.4 we had 16GB allocated and no problems.

Why ES needs all that RAM? It's doing nothing, not servicing a single
request... it's only starting the recovery process...

We now have our production cluster down. It's really a huge problem for
us, please advise on any solutions or things we could try out.

Java is IcedTea 7.2.5.3
OS is Gentoo Linux

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7b5b4fb8-bbd6-4db2-ad10-8446ebd92140%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c3e4a74c-cd54-455b-ade2-dee980d1b457%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c3e4a74c-cd54-455b-ade2-dee980d1b457%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd0f45RkB89w8uaWtPQnd-KXpvj%2BjG66dZh422Jxm2ifng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.