High CPU on some nodes in the cluster

Hi,

I'm looking into why some of the nodes we have in our cluster have a high CPU load from time to time. I run a hot_threads output but I really can't make heads or tails from it. Could someone point me in the right direction.

Here is a link to the output, https://gist.github.com/amscotti/8252b0c8434f40b66aa9

If there is any more info that would help let me know.

Thanks,
Anthony

The hot threads doesn't look like anything in particular.

What about GC, it should be mentioned in your logs? Can you give us some
other stats on your cluster, like size, nodes, java and ES version etc?

On 3 February 2015 at 02:38, amscotti anthony.m.scotti@gmail.com wrote:

Hi,

I'm looking into why some of the nodes we have in our cluster have a high
CPU load from time to time. I run a hot_threads output but I really can't
make heads or tails from it. Could someone point me in the right direction.

Here is a link to the output,
gist:8252b0c8434f40b66aa9 · GitHub
https://gist.github.com/amscotti/8252b0c8434f40b66aa9

If there is any more info that would help let me know.

Thanks,
Anthony

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/High-CPU-on-some-nodes-in-the-cluster-tp4069943.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1422891519949-4069943.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_dH3gJFFRhHZqZMjHmZu_b9v55y%3D0siqTMQ4A%3D-uKZFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks for the reply!

Cluster size is 9 nodes hosted on AWS using r3.2xlarge instance type (8 Core and 61gb of Memory)
ES version is 1.3.7
Java version is "1.7.0_60"

Looking at the logs for one of the nodes with high CPU load and nothing is jumping out at me. What should I be looking for?

Also, I forgot to mention last time that this only occurs on one or two nodes. All nodes are behind a load balancer and should be getting equivalent traffic. Also always the same nodes that have this issue. I'll nodes are the same and are setup by OpsWorks (Chef).

Thanks again,
Anthony

More details would be definitely helpful.

  • Are you on Spindles or SSDs?
  • Can you correlate high CPU with some other activities e.g. high I/O,
    index refresh, segment merges?
  • Are you using marvel (its your best friend to understand whats creating
    CPU load)

In our setup we ran in to High CPU pressures due to i/o bottlenecks. We
were on spindles and our indexing volume would push the disk I/O to peak,
resulting in CPU spikes.
Short term fix, we did a firmware upgrade on disks.
Long term fix we are now on SSDs
Hope this helps.

On Tuesday, 3 February 2015 11:18:20 UTC-8, Anthony Scotti wrote:

Hi,

I'm looking into why some of the nodes we have in our cluster have a high
CPU load from time to time. I run a hot_threads output but I really can't
make heads or tails from it. Could someone point me in the right
direction.

Here is a link to the output,
gist:8252b0c8434f40b66aa9 · GitHub
https://gist.github.com/amscotti/8252b0c8434f40b66aa9

If there is any more info that would help let me know.

Thanks,
Anthony

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/High-CPU-on-some-nodes-in-the-cluster-tp4069943.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9c5f71a5-38de-422f-bf9f-ee141623aa76%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Sarang,

We are using 'General Purpose (SSD)' on all the systems on AWS. All the
IOPs for the nodes in the cluster are about the same but only 1~2 nodes are
having high CPU/Load.
I do have Marvel installed, where should I be looking? Looking at all the
details for the node nothing is really popping out for me but I could just
be over looking it.

Let me know if any more info would be helpful. I am kind of at a lost of
where to look.

Here is an image of the 2 nodes that are having the issue from the
dashboard,

https://lh4.googleusercontent.com/-z-MVu1ZU9so/VNIk88BuiyI/AAAAAAAAt1c/Lg7K0Ui7_58/s1600/Cursor_and_Marvel_-_Overview.png

Thanks,

Anthony

On Wednesday, February 4, 2015 at 2:49:06 AM UTC-5, Sarang Zargar wrote:

More details would be definitely helpful.

  • Are you on Spindles or SSDs?
  • Can you correlate high CPU with some other activities e.g. high I/O,
    index refresh, segment merges?
  • Are you using marvel (its your best friend to understand whats creating
    CPU load)

In our setup we ran in to High CPU pressures due to i/o bottlenecks. We
were on spindles and our indexing volume would push the disk I/O to peak,
resulting in CPU spikes.
Short term fix, we did a firmware upgrade on disks.
Long term fix we are now on SSDs
Hope this helps.

On Tuesday, 3 February 2015 11:18:20 UTC-8, Anthony Scotti wrote:

Hi,

I'm looking into why some of the nodes we have in our cluster have a high
CPU load from time to time. I run a hot_threads output but I really can't
make heads or tails from it. Could someone point me in the right
direction.

Here is a link to the output,
gist:8252b0c8434f40b66aa9 · GitHub
https://gist.github.com/amscotti/8252b0c8434f40b66aa9

If there is any more info that would help let me know.

Thanks,
Anthony

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/High-CPU-on-some-nodes-in-the-cluster-tp4069943.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4f642a4a-844c-4dff-976a-86f92c0f69f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Can you execute "top" on these instances and see whats pushing the CPU. Is
it a rogue process or something else?
Marvel will not be really helpful here, you need to look in to the
instance.
Please share the findings, this is really interesting.

On Wednesday, 4 February 2015 05:58:53 UTC-8, Anthony Scotti wrote:

Hi Sarang,

We are using 'General Purpose (SSD)' on all the systems on AWS. All the
IOPs for the nodes in the cluster are about the same but only 1~2 nodes
are having high CPU/Load.
I do have Marvel installed, where should I be looking? Looking at all the
details for the node nothing is really popping out for me but I could just
be over looking it.

Let me know if any more info would be helpful. I am kind of at a lost of
where to look.

Here is an image of the 2 nodes that are having the issue from the
dashboard,

https://lh4.googleusercontent.com/-z-MVu1ZU9so/VNIk88BuiyI/AAAAAAAAt1c/Lg7K0Ui7_58/s1600/Cursor_and_Marvel_-_Overview.png

Thanks,

Anthony

On Wednesday, February 4, 2015 at 2:49:06 AM UTC-5, Sarang Zargar wrote:

More details would be definitely helpful.

  • Are you on Spindles or SSDs?
  • Can you correlate high CPU with some other activities e.g. high I/O,
    index refresh, segment merges?
  • Are you using marvel (its your best friend to understand whats creating
    CPU load)

In our setup we ran in to High CPU pressures due to i/o bottlenecks. We
were on spindles and our indexing volume would push the disk I/O to peak,
resulting in CPU spikes.
Short term fix, we did a firmware upgrade on disks.
Long term fix we are now on SSDs
Hope this helps.

On Tuesday, 3 February 2015 11:18:20 UTC-8, Anthony Scotti wrote:

Hi,

I'm looking into why some of the nodes we have in our cluster have a
high
CPU load from time to time. I run a hot_threads output but I really
can't
make heads or tails from it. Could someone point me in the right
direction.

Here is a link to the output,
gist:8252b0c8434f40b66aa9 · GitHub
https://gist.github.com/amscotti/8252b0c8434f40b66aa9

If there is any more info that would help let me know.

Thanks,
Anthony

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/High-CPU-on-some-nodes-in-the-cluster-tp4069943.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f0a2900d-4905-4850-aadb-1d21f194261f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

I just ran top on one of the nodes with high CPU, it seems like the top
process is elasticsearch. Here is a screen shot,

https://lh5.googleusercontent.com/-lYW7faDf8KY/VNi-6A1UkXI/AAAAAAAAt2g/VN1ecGJsJb8/s1600/1__tmux__tmux__and_Marvel_-_Overview.png
Everything else seems very low compared to elasticsearch.

Thanks,
Anthony

On Wednesday, February 4, 2015 at 9:38:52 PM UTC-5, Sarang Zargar wrote:

Can you execute "top" on these instances and see whats pushing the CPU. Is
it a rogue process or something else?
Marvel will not be really helpful here, you need to look in to the
instance.
Please share the findings, this is really interesting.

On Wednesday, 4 February 2015 05:58:53 UTC-8, Anthony Scotti wrote:

Hi Sarang,

We are using 'General Purpose (SSD)' on all the systems on AWS. All the
IOPs for the nodes in the cluster are about the same but only 1~2 nodes
are having high CPU/Load.
I do have Marvel installed, where should I be looking? Looking at all the
details for the node nothing is really popping out for me but I could just
be over looking it.

Let me know if any more info would be helpful. I am kind of at a lost of
where to look.

Here is an image of the 2 nodes that are having the issue from the
dashboard,

https://lh4.googleusercontent.com/-z-MVu1ZU9so/VNIk88BuiyI/AAAAAAAAt1c/Lg7K0Ui7_58/s1600/Cursor_and_Marvel_-_Overview.png

Thanks,

Anthony

On Wednesday, February 4, 2015 at 2:49:06 AM UTC-5, Sarang Zargar wrote:

More details would be definitely helpful.

  • Are you on Spindles or SSDs?
  • Can you correlate high CPU with some other activities e.g. high I/O,
    index refresh, segment merges?
  • Are you using marvel (its your best friend to understand whats
    creating CPU load)

In our setup we ran in to High CPU pressures due to i/o bottlenecks. We
were on spindles and our indexing volume would push the disk I/O to peak,
resulting in CPU spikes.
Short term fix, we did a firmware upgrade on disks.
Long term fix we are now on SSDs
Hope this helps.

On Tuesday, 3 February 2015 11:18:20 UTC-8, Anthony Scotti wrote:

Hi,

I'm looking into why some of the nodes we have in our cluster have a
high
CPU load from time to time. I run a hot_threads output but I really
can't
make heads or tails from it. Could someone point me in the right
direction.

Here is a link to the output,
gist:8252b0c8434f40b66aa9 · GitHub
https://gist.github.com/amscotti/8252b0c8434f40b66aa9

If there is any more info that would help let me know.

Thanks,
Anthony

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/High-CPU-on-some-nodes-in-the-cluster-tp4069943.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6e08c842-f001-45e6-8544-2ba890c37c4d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.