I am trying to find out what could be causing system load to be over 6.5 on
a 6 cores server. This is not yet critically alarming but this does not
look great. Before throwing more CPU at the problem I would like to
troubleshoot and figure out what is the best solution here.
I have gist a hot thread dumps and some more info. Please find the links
bellow. Thank you for helping out.
I am trying to find out what could be causing system load to be over 6.5
on a 6 cores server. This is not yet critically alarming but this does not
look great. Before throwing more CPU at the problem I would like to
troubleshoot and figure out what is the best solution here.
I have gist a hot thread dumps and some more info. Please find the links
bellow. Thank you for helping out.
The CPU usage is high? Can you share some graphs that show trends? Is the
CPU wait time high by some chance? user? system? Can you correlate CPU
usage with disk IO or GC?
You can easily look at this sort of stuff with SPM for ES and send any
graphs you want directly to this list, so we can see them and help.
On Thursday, January 2, 2014 7:00:19 PM UTC-5, Gregory S wrote:
Hi all,
I am trying to find out what could be causing system load to be over 6.5
on a 6 cores server. This is not yet critically alarming but this does not
look great. Before throwing more CPU at the problem I would like to
troubleshoot and figure out what is the best solution here.
I have gist a hot thread dumps and some more info. Please find the links
bellow. Thank you for helping out.
Here some interesting trends. Basically this seems to confirm we are not IO
bound. Also The Load, CPU, Garbage collection, Write IO per seconds and
Query Latency increase with the Query count (see attached graphs).
This is all expected. The only thing that concerns me the most is that
Query response time is starting to slow down significantly (~200 ms) and
the Load is going above the number of cores (6) during peak traffic...
The CPU usage is high? Can you share some graphs that show trends? Is
the CPU wait time high by some chance? user? system? Can you correlate CPU
usage with disk IO or GC?
You can easily look at this sort of stuff with SPM for ES and send any
graphs you want directly to this list, so we can see them and help.
On Thursday, January 2, 2014 7:00:19 PM UTC-5, Gregory S wrote:
Hi all,
I am trying to find out what could be causing system load to be over 6.5
on a 6 cores server. This is not yet critically alarming but this does not
look great. Before throwing more CPU at the problem I would like to
troubleshoot and figure out what is the best solution here.
I have gist a hot thread dumps and some more info. Please find the links
bellow. Thank you for helping out.
So you have about 140K queries in 1 hour there in one of the graphs and the
latency is close to 200 ms on avg. on a server with 6 cores.
140K queries per hour ==> 140K/60/60 = ~39 QPS
On a server with 6 cores this means 39/6 = 6.5 QPS/core
Each query being avg 200 ms means 6.5 * 0.200 = 1.3
I believe this can be roughly interpreted as "during each second a core has
to do 1.3 seconds worth of work", which leads to some waiting on the CPU,
which is why you see that load.
I don't have the explanation for why the CPU is not at 100%. Maybe because
of those disk writes, which contribute to the load but end up making the
CPU wait? In that case, I'm not sure why we don't see any wait time on the
CPU graphs, unless you removed that metric from the CPU graph.
On Monday, January 6, 2014 8:38:12 PM UTC-5, Gregory S wrote:
Hi Otis,
Here some interesting trends. Basically this seems to confirm we are not
IO bound. Also The Load, CPU, Garbage collection, Write IO per seconds and
Query Latency increase with the Query count (see attached graphs).
This is all expected. The only thing that concerns me the most is that
Query response time is starting to slow down significantly (~200 ms) and
the Load is going above the number of cores (6) during peak traffic...
Thank you
Greg
On Fri, Jan 3, 2014 at 7:04 PM, Otis Gospodnetic <otis.gos...@gmail.com<javascript:>
wrote:
Hi Greg,
The CPU usage is high? Can you share some graphs that show trends? Is
the CPU wait time high by some chance? user? system? Can you correlate CPU
usage with disk IO or GC?
You can easily look at this sort of stuff with SPM for ES and send any
graphs you want directly to this list, so we can see them and help.
On Thursday, January 2, 2014 7:00:19 PM UTC-5, Gregory S wrote:
Hi all,
I am trying to find out what could be causing system load to be over 6.5
on a 6 cores server. This is not yet critically alarming but this does not
look great. Before throwing more CPU at the problem I would like to
troubleshoot and figure out what is the best solution here.
I have gist a hot thread dumps and some more info. Please find the links
bellow. Thank you for helping out.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.