High CPU on some nodes in the cluster

amscotti · February 2, 2015, 3:38pm

Hi,

I'm looking into why some of the nodes we have in our cluster have a high CPU load from time to time. I run a hot_threads output but I really can't make heads or tails from it. Could someone point me in the right direction.

Here is a link to the output, https://gist.github.com/amscotti/8252b0c8434f40b66aa9

If there is any more info that would help let me know.

Thanks,
Anthony

warkolm · February 3, 2015, 9:45pm

The hot threads doesn't look like anything in particular.

What about GC, it should be mentioned in your logs? Can you give us some
other stats on your cluster, like size, nodes, java and ES version etc?

On 3 February 2015 at 02:38, amscotti anthony.m.scotti@gmail.com wrote:

Hi,

I'm looking into why some of the nodes we have in our cluster have a high
CPU load from time to time. I run a hot_threads output but I really can't
make heads or tails from it. Could someone point me in the right direction.

Here is a link to the output,
gist:8252b0c8434f40b66aa9 · GitHub
https://gist.github.com/amscotti/8252b0c8434f40b66aa9

If there is any more info that would help let me know.

Thanks,
Anthony

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/High-CPU-on-some-nodes-in-the-cluster-tp4069943.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1422891519949-4069943.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_dH3gJFFRhHZqZMjHmZu_b9v55y%3D0siqTMQ4A%3D-uKZFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

amscotti · February 3, 2015, 11:46pm

Thanks for the reply!

Cluster size is 9 nodes hosted on AWS using r3.2xlarge instance type (8 Core and 61gb of Memory)
ES version is 1.3.7
Java version is "1.7.0_60"

Looking at the logs for one of the nodes with high CPU load and nothing is jumping out at me. What should I be looking for?

Also, I forgot to mention last time that this only occurs on one or two nodes. All nodes are behind a load balancer and should be getting equivalent traffic. Also always the same nodes that have this issue. I'll nodes are the same and are setup by OpsWorks (Chef).

Thanks again,
Anthony

Sarang_Zargar · February 4, 2015, 7:49am

More details would be definitely helpful.

Are you on Spindles or SSDs?
Can you correlate high CPU with some other activities e.g. high I/O,
index refresh, segment merges?
Are you using marvel (its your best friend to understand whats creating
CPU load)

In our setup we ran in to High CPU pressures due to i/o bottlenecks. We
were on spindles and our indexing volume would push the disk I/O to peak,
resulting in CPU spikes.
Short term fix, we did a firmware upgrade on disks.
Long term fix we are now on SSDs
Hope this helps.

On Tuesday, 3 February 2015 11:18:20 UTC-8, Anthony Scotti wrote:

Hi,

I'm looking into why some of the nodes we have in our cluster have a high
CPU load from time to time. I run a hot_threads output but I really can't
make heads or tails from it. Could someone point me in the right
direction.

Here is a link to the output,
gist:8252b0c8434f40b66aa9 · GitHub
https://gist.github.com/amscotti/8252b0c8434f40b66aa9

If there is any more info that would help let me know.

Thanks,
Anthony

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/High-CPU-on-some-nodes-in-the-cluster-tp4069943.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9c5f71a5-38de-422f-bf9f-ee141623aa76%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

amscotti · February 4, 2015, 1:58pm

Hi Sarang,

We are using 'General Purpose (SSD)' on all the systems on AWS. All the
IOPs for the nodes in the cluster are about the same but only 1~2 nodes are
having high CPU/Load.
I do have Marvel installed, where should I be looking? Looking at all the
details for the node nothing is really popping out for me but I could just
be over looking it.

Let me know if any more info would be helpful. I am kind of at a lost of
where to look.

Here is an image of the 2 nodes that are having the issue from the
dashboard,

https://lh4.googleusercontent.com/-z-MVu1ZU9so/VNIk88BuiyI/AAAAAAAAt1c/Lg7K0Ui7_58/s1600/Cursor_and_Marvel_-_Overview.png

Thanks,

Anthony

On Wednesday, February 4, 2015 at 2:49:06 AM UTC-5, Sarang Zargar wrote:

More details would be definitely helpful.

Are you on Spindles or SSDs?

Can you correlate high CPU with some other activities e.g. high I/O,
index refresh, segment merges?

Are you using marvel (its your best friend to understand whats creating
CPU load)

In our setup we ran in to High CPU pressures due to i/o bottlenecks. We
were on spindles and our indexing volume would push the disk I/O to peak,
resulting in CPU spikes.
Short term fix, we did a firmware upgrade on disks.
Long term fix we are now on SSDs
Hope this helps.

On Tuesday, 3 February 2015 11:18:20 UTC-8, Anthony Scotti wrote:

Hi,

I'm looking into why some of the nodes we have in our cluster have a high
CPU load from time to time. I run a hot_threads output but I really can't
make heads or tails from it. Could someone point me in the right
direction.

Here is a link to the output,
gist:8252b0c8434f40b66aa9 · GitHub
https://gist.github.com/amscotti/8252b0c8434f40b66aa9

If there is any more info that would help let me know.

Thanks,
Anthony

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/High-CPU-on-some-nodes-in-the-cluster-tp4069943.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4f642a4a-844c-4dff-976a-86f92c0f69f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sarang_Zargar · February 5, 2015, 2:38am

Can you execute "top" on these instances and see whats pushing the CPU. Is
it a rogue process or something else?
Marvel will not be really helpful here, you need to look in to the
instance.
Please share the findings, this is really interesting.

On Wednesday, 4 February 2015 05:58:53 UTC-8, Anthony Scotti wrote:

Hi Sarang,

We are using 'General Purpose (SSD)' on all the systems on AWS. All the
IOPs for the nodes in the cluster are about the same but only 1~2 nodes
are having high CPU/Load.
I do have Marvel installed, where should I be looking? Looking at all the
details for the node nothing is really popping out for me but I could just
be over looking it.

Let me know if any more info would be helpful. I am kind of at a lost of
where to look.

Here is an image of the 2 nodes that are having the issue from the
dashboard,

https://lh4.googleusercontent.com/-z-MVu1ZU9so/VNIk88BuiyI/AAAAAAAAt1c/Lg7K0Ui7_58/s1600/Cursor_and_Marvel_-_Overview.png

Thanks,

Anthony

On Wednesday, February 4, 2015 at 2:49:06 AM UTC-5, Sarang Zargar wrote:

More details would be definitely helpful.

Are you on Spindles or SSDs?

Can you correlate high CPU with some other activities e.g. high I/O,
index refresh, segment merges?

Are you using marvel (its your best friend to understand whats creating
CPU load)

In our setup we ran in to High CPU pressures due to i/o bottlenecks. We
were on spindles and our indexing volume would push the disk I/O to peak,
resulting in CPU spikes.
Short term fix, we did a firmware upgrade on disks.
Long term fix we are now on SSDs
Hope this helps.

On Tuesday, 3 February 2015 11:18:20 UTC-8, Anthony Scotti wrote:

Hi,

I'm looking into why some of the nodes we have in our cluster have a
high
CPU load from time to time. I run a hot_threads output but I really
can't
make heads or tails from it. Could someone point me in the right
direction.

Here is a link to the output,
gist:8252b0c8434f40b66aa9 · GitHub
https://gist.github.com/amscotti/8252b0c8434f40b66aa9

If there is any more info that would help let me know.

Thanks,
Anthony

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/High-CPU-on-some-nodes-in-the-cluster-tp4069943.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f0a2900d-4905-4850-aadb-1d21f194261f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

amscotti · February 9, 2015, 2:09pm

Hi,

I just ran top on one of the nodes with high CPU, it seems like the top
process is elasticsearch. Here is a screen shot,

https://lh5.googleusercontent.com/-lYW7faDf8KY/VNi-6A1UkXI/AAAAAAAAt2g/VN1ecGJsJb8/s1600/1__tmux__tmux__and_Marvel_-_Overview.png
Everything else seems very low compared to elasticsearch.

Thanks,
Anthony

On Wednesday, February 4, 2015 at 9:38:52 PM UTC-5, Sarang Zargar wrote:

Can you execute "top" on these instances and see whats pushing the CPU. Is
it a rogue process or something else?
Marvel will not be really helpful here, you need to look in to the
instance.
Please share the findings, this is really interesting.

On Wednesday, 4 February 2015 05:58:53 UTC-8, Anthony Scotti wrote:

Hi Sarang,

We are using 'General Purpose (SSD)' on all the systems on AWS. All the
IOPs for the nodes in the cluster are about the same but only 1~2 nodes
are having high CPU/Load.
I do have Marvel installed, where should I be looking? Looking at all the
details for the node nothing is really popping out for me but I could just
be over looking it.

Let me know if any more info would be helpful. I am kind of at a lost of
where to look.

Here is an image of the 2 nodes that are having the issue from the
dashboard,

https://lh4.googleusercontent.com/-z-MVu1ZU9so/VNIk88BuiyI/AAAAAAAAt1c/Lg7K0Ui7_58/s1600/Cursor_and_Marvel_-_Overview.png

Thanks,

Anthony

On Wednesday, February 4, 2015 at 2:49:06 AM UTC-5, Sarang Zargar wrote:

More details would be definitely helpful.

Are you on Spindles or SSDs?

Can you correlate high CPU with some other activities e.g. high I/O,
index refresh, segment merges?

Are you using marvel (its your best friend to understand whats
creating CPU load)

In our setup we ran in to High CPU pressures due to i/o bottlenecks. We
were on spindles and our indexing volume would push the disk I/O to peak,
resulting in CPU spikes.
Short term fix, we did a firmware upgrade on disks.
Long term fix we are now on SSDs
Hope this helps.

On Tuesday, 3 February 2015 11:18:20 UTC-8, Anthony Scotti wrote:

Hi,

I'm looking into why some of the nodes we have in our cluster have a
high
CPU load from time to time. I run a hot_threads output but I really
can't
make heads or tails from it. Could someone point me in the right
direction.

Here is a link to the output,
gist:8252b0c8434f40b66aa9 · GitHub
https://gist.github.com/amscotti/8252b0c8434f40b66aa9

If there is any more info that would help let me know.

Thanks,
Anthony

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/High-CPU-on-some-nodes-in-the-cluster-tp4069943.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6e08c842-f001-45e6-8544-2ba890c37c4d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
High CPU load Elasticsearch	9	970	April 12, 2022
Constant High (~99%) CPU on 1 of 5 Nodes in Cluster Elasticsearch	6	879	August 4, 2014
Management threads consuming high CPU on an idle cluster Elasticsearch	13	2573	October 8, 2017
Help please with high CPU utilization on 1 node of cluster :) Elasticsearch	8	11947	October 13, 2015
One ES Data node's CPU jumps to 90%+ suddenly while in production Elasticsearch	6	1065	April 8, 2021

High CPU on some nodes in the cluster

Related topics