0.90.6 upgrade - ES very slow/high CPU usage when recovering some shard


(Ankush Jhalani) #1

We have a 3-level (grandparent, parent, child) index with 100 GB index
size. After upgrading to ES 0.90.6, we noticed that ES became really slow
with a persistent high CPU usage even after recovery completed. It seems
that some post-processing after shard recovery was really heavy and taking
high toll on ES. We had to eventually shut down ES and remove this index
from local directory.

I generated couple hot threads dump(9 minutes apart), and one common theme
is "org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$1.call(IndicesFieldDataCache.java:149)".
I posted complete gist at https://gist.github.com/ajhalani/7320575

Would much appreciate any help to understand the issue here. thanks!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nick Zadrozny) #2

All of those FieldCache calls, if you look a little further down the trace,
are the result of search calls against the index.

I’m guessing your searches rely a lot on the field cache (looks like
facets), which were cleared out when Elasticsearch restarted. The simplest
explanation is that you’re simply seeing the efforts of Elasticsearch to
warm up its caches, although that would more likely manifest as system load
than CPU per se.

Are you serving a large volume of traffic right away? Do you perhaps have a
lot of warming queries? How much memory do your servers have, and how much
have you allocated to Elasticsearch specifically?

On Tue, Nov 5, 2013 at 8:29 AM, Ankush Jhalani ankush.jhalani@gmail.comwrote:

We have a 3-level (grandparent, parent, child) index with 100 GB index
size. After upgrading to ES 0.90.6, we noticed that ES became really slow
with a persistent high CPU usage even after recovery completed. It seems
that some post-processing after shard recovery was really heavy and taking
high toll on ES. We had to eventually shut down ES and remove this index
from local directory.

I generated couple hot threads dump(9 minutes apart), and one common theme
is "org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$1.call(IndicesFieldDataCache.java:149)".
I posted complete gist at https://gist.github.com/ajhalani/7320575

Would much appreciate any help to understand the issue here. thanks!!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Nick Zadrozny

Cofounder, CEO
One More Cloud

websolr.com • bonsai.io

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ankush Jhalani) #3

Nick you seem to be right. We had a huge registered warmer(known to take a
lot of time) for this index, and looks like there's a bug in 0.90.6 {
https://github.com/elasticsearch/elasticsearch/issues/4078}.
We were not serving any traffic. thanks!!

On Tuesday, November 5, 2013 10:29:48 AM UTC-5, Ankush Jhalani wrote:

We have a 3-level (grandparent, parent, child) index with 100 GB index
size. After upgrading to ES 0.90.6, we noticed that ES became really slow
with a persistent high CPU usage even after recovery completed. It seems
that some post-processing after shard recovery was really heavy and taking
high toll on ES. We had to eventually shut down ES and remove this index
from local directory.

I generated couple hot threads dump(9 minutes apart), and one common theme
is "org.elasticsearch.indices.fielddata.cache.IndicesFieldDataCache$IndexFieldCache$1.call(IndicesFieldDataCache.java:149)".
I posted complete gist at https://gist.github.com/ajhalani/7320575

Would much appreciate any help to understand the issue here. thanks!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4