CPU usage increase after running for a while (CacheRecycler?)

Jerome_Gagnon · February 21, 2013, 4:17pm

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are having
some cpu usage issues after some uptime. When we start, everything is
running fine, but after a while we are experiencing an increase in the cpu
usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of the cpu
usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are still
currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is one of
the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · February 21, 2013, 4:18pm

Edit; We believe this is not related to GC since CPU gc usage is 2-5% on
all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon wrote:

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are having
some cpu usage issues after some uptime. When we start, everything is
running fine, but after a while we are experiencing an increase in the cpu
usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of the
cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are still
currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is one of
the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · February 21, 2013, 4:32pm

Edit2; We also reduced the search thread_pool size, since we think that all
the thread are trying to call the method posted up there, and with a
blocking call there is some kind of contention there, over time the xfer
and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:

Edit; We believe this is not related to GC since CPU gc usage is 2-5% on
all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon wrote:

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are having
some cpu usage issues after some uptime. When we start, everything is
running fine, but after a while we are experiencing an increase in the cpu
usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of the
cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are still
currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is one of
the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

kimchy · February 21, 2013, 6:49pm

Can you issue hot threads when you see the increased CPU usage and gist it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome.gagnon.1@gmail.com wrote:

Edit2; We also reduced the search thread_pool size, since we think that all the thread are trying to call the method posted up there, and with a blocking call there is some kind of contention there, over time the xfer and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:
Edit; We believe this is not related to GC since CPU gc usage is 2-5% on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon wrote:
Hi everyone,

We are actually trying to put in prod our ES cluster, and we are having some cpu usage issues after some uptime. When we start, everything is running fine, but after a while we are experiencing an increase in the cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of the cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is one of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · February 21, 2013, 6:54pm

Sure it's hapenning right now as a matter of fact...

gist.github.com

https://gist.github.com/jgagnon1/5007106

gistfile1.txt

::: [es67b][BPTeIxwsQkm3_KNATVqgJQ][inet[es67b/10.1.18.57:9300]]{master=true}
   
   99.3% (496.2ms out of 500ms) cpu usage by thread 'elasticsearch[es67b][search][T#3]'
     8/10 snapshots sharing following 15 elements
       org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.tryAppend(LinkedTransferQueue.java:653)
       org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:611)
       org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
       org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:502)
       org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:492)
       org.elasticsearch.search.facet.terms.ints.TermsIntOrdinalsFacetCollector.facet(TermsIntOrdinalsFacetCollector.java:188)

This file has been truncated. show original

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:

Can you issue hot threads when you see the increased CPU usage and gist it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon <jerome....@gmail.com<javascript:>>
wrote:

Edit2; We also reduced the search thread_pool size, since we think that
all the thread are trying to call the method posted up there, and with a
blocking call there is some kind of contention there, over time the xfer
and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:

Edit; We believe this is not related to GC since CPU gc usage is 2-5% on
all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon wrote:

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are having
some cpu usage issues after some uptime. When we start, everything is
running fine, but after a while we are experiencing an increase in the cpu
usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of the
cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are still
currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is one
of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · February 21, 2013, 7:04pm

And for the whole cluster...

gist.github.com

https://gist.github.com/jgagnon1/5007186

gistfile1.txt

::: [es7a][-2NqUY-ASnypSigd5KMrIw][inet[/10.1.13.207:9300]]{master=true}
   
   16.8% (84.2ms out of 500ms) cpu usage by thread 'elasticsearch[es7a][search][T#3]'
     10/10 snapshots sharing following 8 elements
       sun.misc.Unsafe.park(Native Method)
       java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
       java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
       java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
       java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
       java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)

This file has been truncated. show original

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:

Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:

Can you issue hot threads when you see the increased CPU usage and gist
it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com wrote:

Edit2; We also reduced the search thread_pool size, since we think that
all the thread are trying to call the method posted up there, and with a
blocking call there is some kind of contention there, over time the xfer
and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:

Edit; We believe this is not related to GC since CPU gc usage is 2-5% on
all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon wrote:

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are having
some cpu usage issues after some uptime. When we start, everything is
running fine, but after a while we are experiencing an increase in the cpu
usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of the
cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are
still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is one
of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

kimchy · February 21, 2013, 8:13pm

This is really strange…, I suggest two things: First, are you sure there is no memory pressure heap wise? Second, I pushed to 0.20 branch updated version of those concurrent collections, maybe you can give a go with it?

On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon jerome.gagnon.1@gmail.com wrote:

And for the whole cluster...

gist:5007186 · GitHub

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:
Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:
Can you issue hot threads when you see the increased CPU usage and gist it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com wrote:

Edit2; We also reduced the search thread_pool size, since we think that all the thread are trying to call the method posted up there, and with a blocking call there is some kind of contention there, over time the xfer and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:
Edit; We believe this is not related to GC since CPU gc usage is 2-5% on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon wrote:
Hi everyone,

We are actually trying to put in prod our ES cluster, and we are having some cpu usage issues after some uptime. When we start, everything is running fine, but after a while we are experiencing an increase in the cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of the cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is one of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · February 21, 2013, 8:32pm

First, I'm pretty sure there is no heap pressure, the heap usage is between
10 and 12go on 15go total and the gc pattern is neat.

And I'm forking it right now, I will let you know..

On Thursday, February 21, 2013 3:13:05 PM UTC-5, kimchy wrote:

This is really strange…, I suggest two things: First, are you sure there
is no memory pressure heap wise? Second, I pushed to 0.20 branch updated
version of those concurrent collections, maybe you can give a go with it?

On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon <jerome....@gmail.com<javascript:>>
wrote:

And for the whole cluster...

gist:5007186 · GitHub

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:

Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:

Can you issue hot threads when you see the increased CPU usage and gist
it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com wrote:

Edit2; We also reduced the search thread_pool size, since we think that
all the thread are trying to call the method posted up there, and with a
blocking call there is some kind of contention there, over time the xfer
and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:

Edit; We believe this is not related to GC since CPU gc usage is 2-5%
on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon wrote:

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are
having some cpu usage issues after some uptime. When we start, everything
is running fine, but after a while we are experiencing an increase in the
cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of
the cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are
still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is one
of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · February 21, 2013, 9:45pm

By the way, thanks for the help, it's really appreciated !

On Thursday, February 21, 2013 3:32:50 PM UTC-5, Jérôme Gagnon wrote:

First, I'm pretty sure there is no heap pressure, the heap usage is
between 10 and 12go on 15go total and the gc pattern is neat.

And I'm forking it right now, I will let you know..

On Thursday, February 21, 2013 3:13:05 PM UTC-5, kimchy wrote:

This is really strange…, I suggest two things: First, are you sure there
is no memory pressure heap wise? Second, I pushed to 0.20 branch updated
version of those concurrent collections, maybe you can give a go with it?

On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon jerome....@gmail.com wrote:

And for the whole cluster...

gist:5007186 · GitHub

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:

Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:

Can you issue hot threads when you see the increased CPU usage and gist
it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

Edit2; We also reduced the search thread_pool size, since we think that
all the thread are trying to call the method posted up there, and with a
blocking call there is some kind of contention there, over time the xfer
and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:

Edit; We believe this is not related to GC since CPU gc usage is 2-5%
on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon wrote:

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are
having some cpu usage issues after some uptime. When we start, everything
is running fine, but after a while we are experiencing an increase in the
cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of
the cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are
still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is
one of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · February 22, 2013, 2:44pm

Early update... contention seems to have moved from LinkedTransferQueue to
ConcurentLinkedQueue (no surprise here)

I am still seeing high cpu usage though, cluster is still running more
updates to come

gist.github.com

https://gist.github.com/jgagnon1/5013915

gistfile1.txt

::: [es67b][PUF2OkuJT42Q3Ne1Z4UYuQ][inet[es67b/10.1.18.57:9300]]{master=true}
   
   78.0% (390.1ms out of 500ms) cpu usage by thread 'elasticsearch[es67b][search][T#10]'
     4/10 snapshots sharing following 24 elements
       org.apache.lucene.index.SegmentTermPositions.skipPositions(SegmentTermPositions.java:131)
       org.apache.lucene.index.SegmentTermPositions.lazySkip(SegmentTermPositions.java:169)
       org.apache.lucene.index.SegmentTermPositions.nextPosition(SegmentTermPositions.java:70)
       org.apache.lucene.search.spans.TermSpans.skipTo(TermSpans.java:73)
       org.apache.lucene.search.spans.SpanScorer.advance(SpanScorer.java:68)
       org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:97)

This file has been truncated. show original

On Thursday, February 21, 2013 4:45:31 PM UTC-5, Jérôme Gagnon wrote:

By the way, thanks for the help, it's really appreciated !

On Thursday, February 21, 2013 3:32:50 PM UTC-5, Jérôme Gagnon wrote:

First, I'm pretty sure there is no heap pressure, the heap usage is
between 10 and 12go on 15go total and the gc pattern is neat.

And I'm forking it right now, I will let you know..

On Thursday, February 21, 2013 3:13:05 PM UTC-5, kimchy wrote:

This is really strange…, I suggest two things: First, are you sure there
is no memory pressure heap wise? Second, I pushed to 0.20 branch updated
version of those concurrent collections, maybe you can give a go with it?

On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon jerome....@gmail.com wrote:

And for the whole cluster...

gist:5007186 · GitHub

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:

Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:

Can you issue hot threads when you see the increased CPU usage and
gist it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

Edit2; We also reduced the search thread_pool size, since we think
that all the thread are trying to call the method posted up there, and with
a blocking call there is some kind of contention there, over time the xfer
and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:

Edit; We believe this is not related to GC since CPU gc usage is 2-5%
on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon wrote:

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are
having some cpu usage issues after some uptime. When we start, everything
is running fine, but after a while we are experiencing an increase in the
cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of
the cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are
still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is
one of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

kimchy · February 22, 2013, 2:48pm

This might just be the CPU needed to compute the terms facet… . Might be that the sampling done to get the hot threads end up coming up with the addition to the queue...

On Feb 22, 2013, at 3:44 PM, Jérôme Gagnon jerome.gagnon.1@gmail.com wrote:

Early update... contention seems to have moved from LinkedTransferQueue to ConcurentLinkedQueue (no surprise here)

I am still seeing high cpu usage though, cluster is still running more updates to come

gist:5013915 · GitHub

On Thursday, February 21, 2013 4:45:31 PM UTC-5, Jérôme Gagnon wrote:
By the way, thanks for the help, it's really appreciated !

On Thursday, February 21, 2013 3:32:50 PM UTC-5, Jérôme Gagnon wrote:
First, I'm pretty sure there is no heap pressure, the heap usage is between 10 and 12go on 15go total and the gc pattern is neat.

And I'm forking it right now, I will let you know..

On Thursday, February 21, 2013 3:13:05 PM UTC-5, kimchy wrote:
This is really strange…, I suggest two things: First, are you sure there is no memory pressure heap wise? Second, I pushed to 0.20 branch updated version of those concurrent collections, maybe you can give a go with it?

On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon jerome....@gmail.com wrote:

And for the whole cluster...

gist:5007186 · GitHub

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:
Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:
Can you issue hot threads when you see the increased CPU usage and gist it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com wrote:

Edit2; We also reduced the search thread_pool size, since we think that all the thread are trying to call the method posted up there, and with a blocking call there is some kind of contention there, over time the xfer and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:
Edit; We believe this is not related to GC since CPU gc usage is 2-5% on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon wrote:
Hi everyone,

We are actually trying to put in prod our ES cluster, and we are having some cpu usage issues after some uptime. When we start, everything is running fine, but after a while we are experiencing an increase in the cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of the cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is one of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · February 22, 2013, 2:48pm

Cluster hot_threads gist

gist.github.com

https://gist.github.com/jgagnon1/5013938

gistfile1.txt

::: [es13b][BR5TbYEATsuuibxDfuk_ww][inet[/10.1.16.155:9300]]{master=true}
   
   58.4% (292ms out of 500ms) cpu usage by thread 'elasticsearch[es13b][search][T#7]'
     5/10 snapshots sharing following 14 elements
       java.util.concurrent.ConcurrentLinkedQueue.offer(ConcurrentLinkedQueue.java:352)
       java.util.concurrent.ConcurrentLinkedQueue.add(ConcurrentLinkedQueue.java:296)
       org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:502)
       org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:492)
       org.elasticsearch.search.facet.terms.ints.TermsIntOrdinalsFacetCollector.facet(TermsIntOrdinalsFacetCollector.java:188)
       org.elasticsearch.search.facet.FacetPhase.execute(FacetPhase.java:140)

This file has been truncated. show original

On Friday, February 22, 2013 9:44:29 AM UTC-5, Jérôme Gagnon wrote:

Early update... contention seems to have moved from LinkedTransferQueue to
ConcurentLinkedQueue (no surprise here)

I am still seeing high cpu usage though, cluster is still running more
updates to come

gist:5013915 · GitHub

On Thursday, February 21, 2013 4:45:31 PM UTC-5, Jérôme Gagnon wrote:

By the way, thanks for the help, it's really appreciated !

On Thursday, February 21, 2013 3:32:50 PM UTC-5, Jérôme Gagnon wrote:

First, I'm pretty sure there is no heap pressure, the heap usage is
between 10 and 12go on 15go total and the gc pattern is neat.

And I'm forking it right now, I will let you know..

On Thursday, February 21, 2013 3:13:05 PM UTC-5, kimchy wrote:

This is really strange…, I suggest two things: First, are you sure
there is no memory pressure heap wise? Second, I pushed to 0.20 branch
updated version of those concurrent collections, maybe you can give a go
with it?

On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

And for the whole cluster...

gist:5007186 · GitHub

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:

Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:

Can you issue hot threads when you see the increased CPU usage and
gist it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

Edit2; We also reduced the search thread_pool size, since we think
that all the thread are trying to call the method posted up there, and with
a blocking call there is some kind of contention there, over time the xfer
and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:

Edit; We believe this is not related to GC since CPU gc usage is
2-5% on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon
wrote:

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are
having some cpu usage issues after some uptime. When we start, everything
is running fine, but after a while we are experiencing an increase in the
cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of
the cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are
still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is
one of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · February 22, 2013, 2:56pm

So basically it's normal that my cpu is at 100% and 25 to 30% of the cpu is
used by;

org.elasticsearch.common.CacheRecycler.pushIntArray() 23.500212 467026 ms
(23.5%) 467026 ms
I'm only doing a facetting on a int field with low cardinality (5 different
value possible)

On Friday, February 22, 2013 9:48:03 AM UTC-5, kimchy wrote:

This might just be the CPU needed to compute the terms facet… . Might be
that the sampling done to get the hot threads end up coming up with the
addition to the queue...

On Feb 22, 2013, at 3:44 PM, Jérôme Gagnon <jerome....@gmail.com<javascript:>>
wrote:

Early update... contention seems to have moved from LinkedTransferQueue to
ConcurentLinkedQueue (no surprise here)

I am still seeing high cpu usage though, cluster is still running more
updates to come

gist:5013915 · GitHub

On Thursday, February 21, 2013 4:45:31 PM UTC-5, Jérôme Gagnon wrote:

By the way, thanks for the help, it's really appreciated !

On Thursday, February 21, 2013 3:32:50 PM UTC-5, Jérôme Gagnon wrote:

First, I'm pretty sure there is no heap pressure, the heap usage is
between 10 and 12go on 15go total and the gc pattern is neat.

And I'm forking it right now, I will let you know..

On Thursday, February 21, 2013 3:13:05 PM UTC-5, kimchy wrote:

This is really strange…, I suggest two things: First, are you sure
there is no memory pressure heap wise? Second, I pushed to 0.20 branch
updated version of those concurrent collections, maybe you can give a go
with it?

On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

And for the whole cluster...

gist:5007186 · GitHub

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:

Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:

Can you issue hot threads when you see the increased CPU usage and
gist it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

Edit2; We also reduced the search thread_pool size, since we think
that all the thread are trying to call the method posted up there, and with
a blocking call there is some kind of contention there, over time the xfer
and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:

Edit; We believe this is not related to GC since CPU gc usage is
2-5% on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon
wrote:

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are
having some cpu usage issues after some uptime. When we start, everything
is running fine, but after a while we are experiencing an increase in the
cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of
the cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are
still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is
one of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · February 22, 2013, 3:14pm

Most of the queries are doing facets on a low cardinality int field (5-6
differents values possible) with a facetFilter. I am not sure that it's
suposed to use that much cpu.

Moreover there still seems to be a contention somwhere, all my cpu are gone
up to 100% and query time is still increasing.

On Friday, February 22, 2013 9:48:03 AM UTC-5, kimchy wrote:

This might just be the CPU needed to compute the terms facet… . Might be
that the sampling done to get the hot threads end up coming up with the
addition to the queue...

On Feb 22, 2013, at 3:44 PM, Jérôme Gagnon <jerome....@gmail.com<javascript:>>
wrote:

Early update... contention seems to have moved from LinkedTransferQueue to
ConcurentLinkedQueue (no surprise here)

I am still seeing high cpu usage though, cluster is still running more
updates to come

gist:5013915 · GitHub

On Thursday, February 21, 2013 4:45:31 PM UTC-5, Jérôme Gagnon wrote:

By the way, thanks for the help, it's really appreciated !

On Thursday, February 21, 2013 3:32:50 PM UTC-5, Jérôme Gagnon wrote:

First, I'm pretty sure there is no heap pressure, the heap usage is
between 10 and 12go on 15go total and the gc pattern is neat.

And I'm forking it right now, I will let you know..

On Thursday, February 21, 2013 3:13:05 PM UTC-5, kimchy wrote:

This is really strange…, I suggest two things: First, are you sure
there is no memory pressure heap wise? Second, I pushed to 0.20 branch
updated version of those concurrent collections, maybe you can give a go
with it?

On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

And for the whole cluster...

gist:5007186 · GitHub

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:

Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:

Can you issue hot threads when you see the increased CPU usage and
gist it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

Edit2; We also reduced the search thread_pool size, since we think
that all the thread are trying to call the method posted up there, and with
a blocking call there is some kind of contention there, over time the xfer
and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:

Edit; We believe this is not related to GC since CPU gc usage is
2-5% on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon
wrote:

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are
having some cpu usage issues after some uptime. When we start, everything
is running fine, but after a while we are experiencing an increase in the
cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of
the cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are
still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is
one of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

kimchy · February 22, 2013, 4:02pm

How many concurrent requests are you executing? What is the search thread pool sestinas? How many cores do you have?

On Feb 22, 2013, at 4:14 PM, Jérôme Gagnon jerome.gagnon.1@gmail.com wrote:

Most of the queries are doing facets on a low cardinality int field (5-6 differents values possible) with a facetFilter. I am not sure that it's suposed to use that much cpu.

Moreover there still seems to be a contention somwhere, all my cpu are gone up to 100% and query time is still increasing.

On Friday, February 22, 2013 9:48:03 AM UTC-5, kimchy wrote:
This might just be the CPU needed to compute the terms facet… . Might be that the sampling done to get the hot threads end up coming up with the addition to the queue...

On Feb 22, 2013, at 3:44 PM, Jérôme Gagnon jerome....@gmail.com wrote:

Early update... contention seems to have moved from LinkedTransferQueue to ConcurentLinkedQueue (no surprise here)

I am still seeing high cpu usage though, cluster is still running more updates to come

gist:5013915 · GitHub

On Thursday, February 21, 2013 4:45:31 PM UTC-5, Jérôme Gagnon wrote:
By the way, thanks for the help, it's really appreciated !

On Thursday, February 21, 2013 3:32:50 PM UTC-5, Jérôme Gagnon wrote:
First, I'm pretty sure there is no heap pressure, the heap usage is between 10 and 12go on 15go total and the gc pattern is neat.

And I'm forking it right now, I will let you know..

On Thursday, February 21, 2013 3:13:05 PM UTC-5, kimchy wrote:
This is really strange…, I suggest two things: First, are you sure there is no memory pressure heap wise? Second, I pushed to 0.20 branch updated version of those concurrent collections, maybe you can give a go with it?

On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon jerome....@gmail.com wrote:

And for the whole cluster...

gist:5007186 · GitHub

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:
Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:
Can you issue hot threads when you see the increased CPU usage and gist it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com wrote:

Edit2; We also reduced the search thread_pool size, since we think that all the thread are trying to call the method posted up there, and with a blocking call there is some kind of contention there, over time the xfer and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:
Edit; We believe this is not related to GC since CPU gc usage is 2-5% on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon wrote:
Hi everyone,

We are actually trying to put in prod our ES cluster, and we are having some cpu usage issues after some uptime. When we start, everything is running fine, but after a while we are experiencing an increase in the cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of the cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is one of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · February 22, 2013, 4:50pm

We looked here at the CacheRecycler code, and we are not sure to see how
the queue size could be decreasing because in popIntArray there is only a
poll() called. So our theory is that the queue size is not bounded and the
process is looping indefinitely in that when it's trying to add a int.
Moreover, there seems to be a correlation between the CPU usage and the
number of field eviction on our servers... so that could potentially be
linked. Maybe you can help us with that ?

We are also not yet sure when those methods (pushIntArray and popIntArray)
are called. We added some logging to check the size of the queue but we
need to give time to see something in the log...

For the information you asked;

Bigdesk shows ~60 QPS.
Thread pool setting;
threadpool:
search:
type: fixed
size: 16
min: 1
queue_size: 64
reject_policy: abort

Our machines have 8 core (2 quad core CPU)

Thank you,

Jerome

On Friday, February 22, 2013 11:02:37 AM UTC-5, kimchy wrote:

How many concurrent requests are you executing? What is the search thread
pool sestinas? How many cores do you have?

On Feb 22, 2013, at 4:14 PM, Jérôme Gagnon <jerome....@gmail.com<javascript:>>
wrote:

Most of the queries are doing facets on a low cardinality int field (5-6
differents values possible) with a facetFilter. I am not sure that it's
suposed to use that much cpu.

Moreover there still seems to be a contention somwhere, all my cpu are
gone up to 100% and query time is still increasing.

On Friday, February 22, 2013 9:48:03 AM UTC-5, kimchy wrote:

This might just be the CPU needed to compute the terms facet… . Might be
that the sampling done to get the hot threads end up coming up with the
addition to the queue...

On Feb 22, 2013, at 3:44 PM, Jérôme Gagnon jerome....@gmail.com wrote:

Early update... contention seems to have moved from LinkedTransferQueue
to ConcurentLinkedQueue (no surprise here)

I am still seeing high cpu usage though, cluster is still running more
updates to come

gist:5013915 · GitHub

On Thursday, February 21, 2013 4:45:31 PM UTC-5, Jérôme Gagnon wrote:

By the way, thanks for the help, it's really appreciated !

On Thursday, February 21, 2013 3:32:50 PM UTC-5, Jérôme Gagnon wrote:

First, I'm pretty sure there is no heap pressure, the heap usage is
between 10 and 12go on 15go total and the gc pattern is neat.

And I'm forking it right now, I will let you know..

On Thursday, February 21, 2013 3:13:05 PM UTC-5, kimchy wrote:

This is really strange…, I suggest two things: First, are you sure
there is no memory pressure heap wise? Second, I pushed to 0.20 branch
updated version of those concurrent collections, maybe you can give a go
with it?

On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

And for the whole cluster...

gist:5007186 · GitHub

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:

Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:

Can you issue hot threads when you see the increased CPU usage and
gist it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

Edit2; We also reduced the search thread_pool size, since we think
that all the thread are trying to call the method posted up there, and with
a blocking call there is some kind of contention there, over time the xfer
and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon
wrote:

Edit; We believe this is not related to GC since CPU gc usage is
2-5% on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon
wrote:

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are
having some cpu usage issues after some uptime. When we start, everything
is running fine, but after a while we are experiencing an increase in the
cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot
of the cpu usage from last night while we experienced the issue (in the
middle)

We added a cache expiry time and found out that it helped. (We are
still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is
one of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · February 22, 2013, 5:15pm

We took another look at the CacheRecycler code and we saw that the poll
method actually removes the head.

Since our profiling now points to offer() for CPU time, we took a look at
the code of ConcurrentLinkedQueue and noticed that it tries to get the tail
and if it fails (tail was changed by another thread), then it scans the
whole list to find the new tail. So our conclusion is that either the size
of the queue is either very large or the number of concurrent access is so
high that it keeps rescanning the queue. If the second hypothesis
(concurrent access) was the problem, we should see the high CPU as soon as
we start the server, which is not the case. So now we think that the number
of push is greater than the number of pop which would cause the queue to
increase in size over time. Could that be the case? As a workaround, we are
thinking about capping the queue size.

What do you think?

On Friday, February 22, 2013 11:50:21 AM UTC-5, Jérôme Gagnon wrote:

We looked here at the CacheRecycler code, and we are not sure to see how
the queue size could be decreasing because in popIntArray there is only a
poll() called. So our theory is that the queue size is not bounded and the
process is looping indefinitely in that when it's trying to add a int.
Moreover, there seems to be a correlation between the CPU usage and the
number of field eviction on our servers... so that could potentially be
linked. Maybe you can help us with that ?

We are also not yet sure when those methods (pushIntArray and popIntArray)
are called. We added some logging to check the size of the queue but we
need to give time to see something in the log...

For the information you asked;

Bigdesk shows ~60 QPS.
Thread pool setting;
threadpool:
search:
type: fixed
size: 16
min: 1
queue_size: 64
reject_policy: abort

Our machines have 8 core (2 quad core CPU)

Thank you,

Jerome

On Friday, February 22, 2013 11:02:37 AM UTC-5, kimchy wrote:

How many concurrent requests are you executing? What is the search thread
pool sestinas? How many cores do you have?

On Feb 22, 2013, at 4:14 PM, Jérôme Gagnon jerome....@gmail.com wrote:

Most of the queries are doing facets on a low cardinality int field (5-6
differents values possible) with a facetFilter. I am not sure that it's
suposed to use that much cpu.

Moreover there still seems to be a contention somwhere, all my cpu are
gone up to 100% and query time is still increasing.

On Friday, February 22, 2013 9:48:03 AM UTC-5, kimchy wrote:

This might just be the CPU needed to compute the terms facet… . Might be
that the sampling done to get the hot threads end up coming up with the
addition to the queue...

On Feb 22, 2013, at 3:44 PM, Jérôme Gagnon jerome....@gmail.com wrote:

Early update... contention seems to have moved from LinkedTransferQueue
to ConcurentLinkedQueue (no surprise here)

I am still seeing high cpu usage though, cluster is still running more
updates to come

gist:5013915 · GitHub

On Thursday, February 21, 2013 4:45:31 PM UTC-5, Jérôme Gagnon wrote:

By the way, thanks for the help, it's really appreciated !

On Thursday, February 21, 2013 3:32:50 PM UTC-5, Jérôme Gagnon wrote:

First, I'm pretty sure there is no heap pressure, the heap usage is
between 10 and 12go on 15go total and the gc pattern is neat.

And I'm forking it right now, I will let you know..

On Thursday, February 21, 2013 3:13:05 PM UTC-5, kimchy wrote:

This is really strange…, I suggest two things: First, are you sure
there is no memory pressure heap wise? Second, I pushed to 0.20 branch
updated version of those concurrent collections, maybe you can give a go
with it?

On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

And for the whole cluster...

gist:5007186 · GitHub

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:

Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:

Can you issue hot threads when you see the increased CPU usage and
gist it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

Edit2; We also reduced the search thread_pool size, since we think
that all the thread are trying to call the method posted up there, and with
a blocking call there is some kind of contention there, over time the xfer
and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon
wrote:

Edit; We believe this is not related to GC since CPU gc usage is
2-5% on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon
wrote:

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are
having some cpu usage issues after some uptime. When we start, everything
is running fine, but after a while we are experiencing an increase in the
cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot
of the cpu usage from last night while we experienced the issue (in the
middle)

We added a cache expiry time and found out that it helped. (We
are still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler
is one of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · February 22, 2013, 5:34pm

I think your analysis is correct, in the sense that
ConcurrentLinkedQueue is a beast. Note, the size() method is not always
linear. In my tests, I observed edge situations, when the queue elements
are created and added faster than they are consumed, for example by slow
consumer threads, the CPU usage for size() - which must iterate through
the list - rises up more than linear. I'm not sure if this alone will
eat the CPU, but there is a price to pay for an unbounded concurrent
queue. Another option would be a bounded concurrent queue like
ArrayBlockingQueue (which has other shortcomings, the capacity can not
be changed). I must confess I found it much harder to program a bounded
concurrent queue than an unbounded concurrent queue (what happens if
offer/poll time out? when should they time out?)

Jörg

Am 22.02.13 18:15, schrieb Jérôme Gagnon:

We took another look at the CacheRecycler code and we saw that the
poll method actually removes the head.

Since our profiling now points to offer() for CPU time, we took a look
at the code of ConcurrentLinkedQueue and noticed that it tries to get
the tail and if it fails (tail was changed by another thread), then it
scans the whole list to find the new tail. So our conclusion is that
either the size of the queue is either very large or the number of
concurrent access is so high that it keeps rescanning the queue. If
the second hypothesis (concurrent access) was the problem, we should
see the high CPU as soon as we start the server, which is not the
case. So now we think that the number of push is greater than the
number of pop which would cause the queue to increase in size over
time. Could that be the case? As a workaround, we are thinking about
capping the queue size.

What do you think?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

kimchy · February 22, 2013, 6:09pm

The number of push can't really be more then the number of pulls, assuming you have a bounded concurrent requests happening (like a bounded search thread pool). Both ConcurrentLinkedQueue/LinkedTransferQueue are non blocking, which means they retry on "failure" (ala compareAndSet). Note, Jorg point about size is not relevant, we don't call size.

What you see if really strange… . Can you try with a newer java version? If you still suspect the ConcurrentLinkedQueue, you can try and use a LinkedBlockingQueue instead and see if it helps.

Do you see the high CPU usage while driving the concurrent searches load? It really might just be ok to see high CPU usage, and has nothing to do with the queue itself. 16 concurrent shard level queries are allowed to execute, if you push enough to keep on filling it, I suspect you will see very high CPU usage (facets are typically very CPU intensive).

On Feb 22, 2013, at 6:15 PM, Jérôme Gagnon jerome.gagnon.1@gmail.com wrote:

We took another look at the CacheRecycler code and we saw that the poll method actually removes the head.

Since our profiling now points to offer() for CPU time, we took a look at the code of ConcurrentLinkedQueue and noticed that it tries to get the tail and if it fails (tail was changed by another thread), then it scans the whole list to find the new tail. So our conclusion is that either the size of the queue is either very large or the number of concurrent access is so high that it keeps rescanning the queue. If the second hypothesis (concurrent access) was the problem, we should see the high CPU as soon as we start the server, which is not the case. So now we think that the number of push is greater than the number of pop which would cause the queue to increase in size over time. Could that be the case? As a workaround, we are thinking about capping the queue size.

What do you think?

On Friday, February 22, 2013 11:50:21 AM UTC-5, Jérôme Gagnon wrote:
We looked here at the CacheRecycler code, and we are not sure to see how the queue size could be decreasing because in popIntArray there is only a poll() called. So our theory is that the queue size is not bounded and the process is looping indefinitely in that when it's trying to add a int. Moreover, there seems to be a correlation between the CPU usage and the number of field eviction on our servers... so that could potentially be linked. Maybe you can help us with that ?

We are also not yet sure when those methods (pushIntArray and popIntArray) are called. We added some logging to check the size of the queue but we need to give time to see something in the log...

For the information you asked;

Bigdesk shows ~60 QPS.
Thread pool setting;
threadpool:
search:
type: fixed
size: 16
min: 1
queue_size: 64
reject_policy: abort

Our machines have 8 core (2 quad core CPU)

Thank you,

Jerome

On Friday, February 22, 2013 11:02:37 AM UTC-5, kimchy wrote:
How many concurrent requests are you executing? What is the search thread pool sestinas? How many cores do you have?

On Feb 22, 2013, at 4:14 PM, Jérôme Gagnon jerome....@gmail.com wrote:

Most of the queries are doing facets on a low cardinality int field (5-6 differents values possible) with a facetFilter. I am not sure that it's suposed to use that much cpu.

Moreover there still seems to be a contention somwhere, all my cpu are gone up to 100% and query time is still increasing.

On Friday, February 22, 2013 9:48:03 AM UTC-5, kimchy wrote:
This might just be the CPU needed to compute the terms facet… . Might be that the sampling done to get the hot threads end up coming up with the addition to the queue...

On Feb 22, 2013, at 3:44 PM, Jérôme Gagnon jerome....@gmail.com wrote:

Early update... contention seems to have moved from LinkedTransferQueue to ConcurentLinkedQueue (no surprise here)

I am still seeing high cpu usage though, cluster is still running more updates to come

gist:5013915 · GitHub

On Thursday, February 21, 2013 4:45:31 PM UTC-5, Jérôme Gagnon wrote:
By the way, thanks for the help, it's really appreciated !

On Thursday, February 21, 2013 3:32:50 PM UTC-5, Jérôme Gagnon wrote:
First, I'm pretty sure there is no heap pressure, the heap usage is between 10 and 12go on 15go total and the gc pattern is neat.

And I'm forking it right now, I will let you know..

On Thursday, February 21, 2013 3:13:05 PM UTC-5, kimchy wrote:
This is really strange…, I suggest two things: First, are you sure there is no memory pressure heap wise? Second, I pushed to 0.20 branch updated version of those concurrent collections, maybe you can give a go with it?

On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon jerome....@gmail.com wrote:

And for the whole cluster...

gist:5007186 · GitHub

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:
Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:
Can you issue hot threads when you see the increased CPU usage and gist it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com wrote:

Edit2; We also reduced the search thread_pool size, since we think that all the thread are trying to call the method posted up there, and with a blocking call there is some kind of contention there, over time the xfer and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:
Edit; We believe this is not related to GC since CPU gc usage is 2-5% on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon wrote:
Hi everyone,

We are actually trying to put in prod our ES cluster, and we are having some cpu usage issues after some uptime. When we start, everything is running fine, but after a while we are experiencing an increase in the cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of the cpu usage from last night while we experienced the issue (in the middle)

We added a cache expiry time and found out that it helped. (We are still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler is one of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jerome_Gagnon · February 22, 2013, 7:26pm

So does, reducing the # of threads would help ? There are ~3 shards per
node, so if you say that the size of thread pool is per lucene shard
instance, that mean 48 search threads ?

I know that facets can be cpu heavy, but I'm doing not an heavy number of
query per second either. I'm not sure what could help next.

As for the record, I am using Java 7 u11 which is pretty much the latest.

On Friday, February 22, 2013 1:09:05 PM UTC-5, kimchy wrote:

The number of push can't really be more then the number of pulls, assuming
you have a bounded concurrent requests happening (like a bounded search
thread pool). Both ConcurrentLinkedQueue/LinkedTransferQueue are non
blocking, which means they retry on "failure" (ala compareAndSet). Note,
Jorg point about size is not relevant, we don't call size.

What you see if really strange… . Can you try with a newer java version?
If you still suspect the ConcurrentLinkedQueue, you can try and use a
LinkedBlockingQueue instead and see if it helps.

Do you see the high CPU usage while driving the concurrent searches load?
It really might just be ok to see high CPU usage, and has nothing to do
with the queue itself. 16 concurrent shard level queries are allowed to
execute, if you push enough to keep on filling it, I suspect you will see
very high CPU usage (facets are typically very CPU intensive).

On Feb 22, 2013, at 6:15 PM, Jérôme Gagnon <jerome....@gmail.com<javascript:>>
wrote:

We took another look at the CacheRecycler code and we saw that the poll
method actually removes the head.

Since our profiling now points to offer() for CPU time, we took a look at
the code of ConcurrentLinkedQueue and noticed that it tries to get the tail
and if it fails (tail was changed by another thread), then it scans the
whole list to find the new tail. So our conclusion is that either the size
of the queue is either very large or the number of concurrent access is so
high that it keeps rescanning the queue. If the second hypothesis
(concurrent access) was the problem, we should see the high CPU as soon as
we start the server, which is not the case. So now we think that the number
of push is greater than the number of pop which would cause the queue to
increase in size over time. Could that be the case? As a workaround, we are
thinking about capping the queue size.

What do you think?

On Friday, February 22, 2013 11:50:21 AM UTC-5, Jérôme Gagnon wrote:

We looked here at the CacheRecycler code, and we are not sure to see how
the queue size could be decreasing because in popIntArray there is only a
poll() called. So our theory is that the queue size is not bounded and the
process is looping indefinitely in that when it's trying to add a int.
Moreover, there seems to be a correlation between the CPU usage and the
number of field eviction on our servers... so that could potentially be
linked. Maybe you can help us with that ?

We are also not yet sure when those methods (pushIntArray and
popIntArray) are called. We added some logging to check the size of the
queue but we need to give time to see something in the log...

For the information you asked;

Bigdesk shows ~60 QPS.
Thread pool setting;
threadpool:
search:
type: fixed
size: 16
min: 1
queue_size: 64
reject_policy: abort

Our machines have 8 core (2 quad core CPU)

Thank you,

Jerome

On Friday, February 22, 2013 11:02:37 AM UTC-5, kimchy wrote:

How many concurrent requests are you executing? What is the search
thread pool sestinas? How many cores do you have?

On Feb 22, 2013, at 4:14 PM, Jérôme Gagnon jerome....@gmail.com wrote:

Most of the queries are doing facets on a low cardinality int field (5-6
differents values possible) with a facetFilter. I am not sure that it's
suposed to use that much cpu.

Moreover there still seems to be a contention somwhere, all my cpu are
gone up to 100% and query time is still increasing.

On Friday, February 22, 2013 9:48:03 AM UTC-5, kimchy wrote:

This might just be the CPU needed to compute the terms facet… . Might
be that the sampling done to get the hot threads end up coming up with the
addition to the queue...

On Feb 22, 2013, at 3:44 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

Early update... contention seems to have moved from LinkedTransferQueue
to ConcurentLinkedQueue (no surprise here)

I am still seeing high cpu usage though, cluster is still running more
updates to come

gist:5013915 · GitHub

On Thursday, February 21, 2013 4:45:31 PM UTC-5, Jérôme Gagnon wrote:

By the way, thanks for the help, it's really appreciated !

On Thursday, February 21, 2013 3:32:50 PM UTC-5, Jérôme Gagnon wrote:

First, I'm pretty sure there is no heap pressure, the heap usage is
between 10 and 12go on 15go total and the gc pattern is neat.

And I'm forking it right now, I will let you know..

On Thursday, February 21, 2013 3:13:05 PM UTC-5, kimchy wrote:

This is really strange…, I suggest two things: First, are you sure
there is no memory pressure heap wise? Second, I pushed to 0.20 branch
updated version of those concurrent collections, maybe you can give a go
with it?

On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

And for the whole cluster...

gist:5007186 · GitHub

On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:

Sure it's hapenning right now as a matter of fact...

ES cpu hot threads · GitHub

On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:

Can you issue hot threads when you see the increased CPU usage and
gist it?

On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com
wrote:

Edit2; We also reduced the search thread_pool size, since we think
that all the thread are trying to call the method posted up there, and with
a blocking call there is some kind of contention there, over time the xfer
and append cpu time is increasing)

We are running latest ES version (0.20.5) with java 7 u11

On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon
wrote:

Edit; We believe this is not related to GC since CPU gc usage is
2-5% on all the nodes and heap is clean between 50-75% usage.

On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon
wrote:

Hi everyone,

We are actually trying to put in prod our ES cluster, and we are
having some cpu usage issues after some uptime. When we start, everything
is running fine, but after a while we are experiencing an increase in the
cpu usage.

http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot
of the cpu usage from last night while we experienced the issue (in the
middle)

We added a cache expiry time and found out that it helped. (We
are still currently running on this, but still, the cpu is increasing)

After doing some CPU profiling, we found out that CacheRecycler
is one of the issue that we have specially at;

at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)

I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.

Thanks for any help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Sudden Unexplained CPU Usage Elasticsearch	17	457	July 6, 2017
Cluster locks up Elasticsearch	9	1669	July 6, 2017
Single thread with high CPU usage Elasticsearch	3	2629	July 6, 2017
High CPU load during search(elasticsearch 1.2.1) Elasticsearch	7	587	July 6, 2017
My Elasticsearch is running at very high CPU (constantly 99%) - Need help understanding hot_threads Elasticsearch	2	1058	July 6, 2017

CPU usage increase after running for a while (CacheRecycler?)

Related topics