The number of push can't really be more then the number of pulls, assuming you have a bounded concurrent requests happening (like a bounded search thread pool). Both ConcurrentLinkedQueue/LinkedTransferQueue are non blocking, which means they retry on "failure" (ala compareAndSet). Note, Jorg point about size is not relevant, we don't call size.
What you see if really strange… . Can you try with a newer java version? If you still suspect the ConcurrentLinkedQueue, you can try and use a LinkedBlockingQueue instead and see if it helps.
Do you see the high CPU usage while driving the concurrent searches load? It really might just be ok to see high CPU usage, and has nothing to do with the queue itself. 16 concurrent shard level queries are allowed to execute, if you push enough to keep on filling it, I suspect you will see very high CPU usage (facets are typically very CPU intensive).
On Feb 22, 2013, at 6:15 PM, Jérôme Gagnon jerome.gagnon.1@gmail.com wrote:
We took another look at the CacheRecycler code and we saw that the poll method actually removes the head.
Since our profiling now points to offer() for CPU time, we took a look at the code of ConcurrentLinkedQueue and noticed that it tries to get the tail and if it fails (tail was changed by another thread), then it scans the whole list to find the new tail. So our conclusion is that either the size of the queue is either very large or the number of concurrent access is so high that it keeps rescanning the queue. If the second hypothesis (concurrent access) was the problem, we should see the high CPU as soon as we start the server, which is not the case. So now we think that the number of push is greater than the number of pop which would cause the queue to increase in size over time. Could that be the case? As a workaround, we are thinking about capping the queue size.
What do you think?
On Friday, February 22, 2013 11:50:21 AM UTC-5, Jérôme Gagnon wrote:
We looked here at the CacheRecycler code, and we are not sure to see how the queue size could be decreasing because in popIntArray there is only a poll() called. So our theory is that the queue size is not bounded and the process is looping indefinitely in that when it's trying to add a int. Moreover, there seems to be a correlation between the CPU usage and the number of field eviction on our servers... so that could potentially be linked. Maybe you can help us with that ?
We are also not yet sure when those methods (pushIntArray and popIntArray) are called. We added some logging to check the size of the queue but we need to give time to see something in the log...
For the information you asked;
Bigdesk shows ~60 QPS.
Thread pool setting;
threadpool:
search:
type: fixed
size: 16
min: 1
queue_size: 64
reject_policy: abort
Our machines have 8 core (2 quad core CPU)
Thank you,
Jerome
On Friday, February 22, 2013 11:02:37 AM UTC-5, kimchy wrote:
How many concurrent requests are you executing? What is the search thread pool sestinas? How many cores do you have?
On Feb 22, 2013, at 4:14 PM, Jérôme Gagnon jerome....@gmail.com wrote:
Most of the queries are doing facets on a low cardinality int field (5-6 differents values possible) with a facetFilter. I am not sure that it's suposed to use that much cpu.
Moreover there still seems to be a contention somwhere, all my cpu are gone up to 100% and query time is still increasing.
On Friday, February 22, 2013 9:48:03 AM UTC-5, kimchy wrote:
This might just be the CPU needed to compute the terms facet… . Might be that the sampling done to get the hot threads end up coming up with the addition to the queue...
On Feb 22, 2013, at 3:44 PM, Jérôme Gagnon jerome....@gmail.com wrote:
Early update... contention seems to have moved from LinkedTransferQueue to ConcurentLinkedQueue (no surprise here)
I am still seeing high cpu usage though, cluster is still running more updates to come
gist:5013915 · GitHub
On Thursday, February 21, 2013 4:45:31 PM UTC-5, Jérôme Gagnon wrote:
By the way, thanks for the help, it's really appreciated !
On Thursday, February 21, 2013 3:32:50 PM UTC-5, Jérôme Gagnon wrote:
First, I'm pretty sure there is no heap pressure, the heap usage is between 10 and 12go on 15go total and the gc pattern is neat.
And I'm forking it right now, I will let you know..
On Thursday, February 21, 2013 3:13:05 PM UTC-5, kimchy wrote:
This is really strange…, I suggest two things: First, are you sure there is no memory pressure heap wise? Second, I pushed to 0.20 branch updated version of those concurrent collections, maybe you can give a go with it?
On Feb 21, 2013, at 8:04 PM, Jérôme Gagnon jerome....@gmail.com wrote:
And for the whole cluster...
gist:5007186 · GitHub
On Thursday, February 21, 2013 1:54:48 PM UTC-5, Jérôme Gagnon wrote:
Sure it's hapenning right now as a matter of fact...
ES cpu hot threads · GitHub
On Thursday, February 21, 2013 1:49:19 PM UTC-5, kimchy wrote:
Can you issue hot threads when you see the increased CPU usage and gist it?
On Feb 21, 2013, at 5:32 PM, Jérôme Gagnon jerome....@gmail.com wrote:
Edit2; We also reduced the search thread_pool size, since we think that all the thread are trying to call the method posted up there, and with a blocking call there is some kind of contention there, over time the xfer and append cpu time is increasing)
We are running latest ES version (0.20.5) with java 7 u11
On Thursday, February 21, 2013 11:18:43 AM UTC-5, Jérôme Gagnon wrote:
Edit; We believe this is not related to GC since CPU gc usage is 2-5% on all the nodes and heap is clean between 50-75% usage.
On Thursday, February 21, 2013 11:17:02 AM UTC-5, Jérôme Gagnon wrote:
Hi everyone,
We are actually trying to put in prod our ES cluster, and we are having some cpu usage issues after some uptime. When we start, everything is running fine, but after a while we are experiencing an increase in the cpu usage.
http://dl.dropbox.com/u/317367/cpu-day.png Here is a screenshot of the cpu usage from last night while we experienced the issue (in the middle)
We added a cache expiry time and found out that it helped. (We are still currently running on this, but still, the cpu is increasing)
After doing some CPU profiling, we found out that CacheRecycler is one of the issue that we have specially at;
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:606)
at org.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.add(LinkedTransferQueue.java:1049)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:470)
at org.elasticsearch.common.CacheRecycler.pushIntArray(CacheRecycler.java:460)
I added the thread dumps in attachment to this post. es7b is having the issue, while es1b not.
Thanks for any help.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.