100% CPU on 1 Node with JMeter Tests

One out of 4 nodes always spikes to 100% CPU when we do some load tests
using JMeter (50 Threads, 50 Loops) with any query (Match_All, Filtered
Query etc.,). That particular node has 3 Shards with 2 Primary Shards. The
other nodes have less than 40% CPU on them at the same time. The heap is
set at 30GB on all of them. This is the GIST for Hot Threads
https://gist.github.com/RobloxSai/9f040bbd5ab7b58f2b1d when the Test was
running. Is there anything else that can be done to improve the
performance? The Query Response times jump to 5-8 seconds when the CPU is
hammered.

https://lh3.googleusercontent.com/-EDnXAEg34cA/U6I5fb2zNOI/AAAAAAAAAB4/DqybJhq3Yhc/s1600/4+Nodes+Setup.png

I had previously posted the specs of the Servers on another thread
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/elasticsearch/P1o_4bVvECA.
Here are the Server Specs:
Machine Specs:
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Number of CPU cores: 24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive: Two 278GB SAS Drive configured in
RAID 0
OS:
Arch: 64bit(x86_64)
OS Type: Linux
Kernel: 2.6.32-431.5.1.el6.x86_64
OS Version: Red Hat Enterprise Linux Server release 6.5
(Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4c80c557-c85d-4319-b7cc-ddd2aebdbd95%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bump

On Wednesday, June 18, 2014 6:20:58 PM UTC-7, sai...@roblox.com wrote:

One out of 4 nodes always spikes to 100% CPU when we do some load tests
using JMeter (50 Threads, 50 Loops) with any query (Match_All, Filtered
Query etc.,). That particular node has 3 Shards with 2 Primary Shards. The
other nodes have less than 40% CPU on them at the same time. The heap is
set at 30GB on all of them. This is the GIST for Hot Threads
https://gist.github.com/RobloxSai/9f040bbd5ab7b58f2b1d when the Test
was running. Is there anything else that can be done to improve the
performance? The Query Response times jump to 5-8 seconds when the CPU is
hammered.

https://lh3.googleusercontent.com/-EDnXAEg34cA/U6I5fb2zNOI/AAAAAAAAAB4/DqybJhq3Yhc/s1600/4+Nodes+Setup.png

I had previously posted the specs of the Servers on another thread
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/elasticsearch/P1o_4bVvECA.
Here are the Server Specs:
Machine Specs:
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Number of CPU cores: 24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive: Two 278GB SAS Drive configured in
RAID 0
OS:
Arch: 64bit(x86_64)
OS Type: Linux
Kernel: 2.6.32-431.5.1.el6.x86_64
OS Version: Red Hat Enterprise Linux Server release 6.5
(Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/57ed23cc-4623-4434-b550-e21723980d1b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello,

It wouldn't surprise me if both Black Mamba and Slapstick were hitting
100%, they have more shards and have to handle more requests than the
others nodes. But in your case it's only one node.

First, are you http requests evenly spread over the 4 nodes? You could also
check that all your shards are about the same size?

To check if it's an hardware problem I would:

  • disable the shards rebalacing
  • stop the cluster
  • switch the whole data directories from Black Mamba and Slapstick
  • start the cluster and rerun the benchmark

You'll then see if the problem comes from the 3 shards or the server
itself.

Cédric Hourcade
ced@wal.fr

On Thu, Jun 19, 2014 at 7:40 PM, sairam@roblox.com wrote:

Bump

On Wednesday, June 18, 2014 6:20:58 PM UTC-7, sai...@roblox.com wrote:

One out of 4 nodes always spikes to 100% CPU when we do some load tests
using JMeter (50 Threads, 50 Loops) with any query (Match_All, Filtered
Query etc.,). That particular node has 3 Shards with 2 Primary Shards. The
other nodes have less than 40% CPU on them at the same time. The heap is
set at 30GB on all of them. This is the GIST for Hot Threads
https://gist.github.com/RobloxSai/9f040bbd5ab7b58f2b1d when the Test
was running. Is there anything else that can be done to improve the
performance? The Query Response times jump to 5-8 seconds when the CPU is
hammered.

https://lh3.googleusercontent.com/-EDnXAEg34cA/U6I5fb2zNOI/AAAAAAAAAB4/DqybJhq3Yhc/s1600/4+Nodes+Setup.png

I had previously posted the specs of the Servers on another thread
https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/elasticsearch/P1o_4bVvECA.
Here are the Server Specs:
Machine Specs:
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @
2.30GHz
Number of CPU cores: 24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive: Two 278GB SAS Drive configured in
RAID 0
OS:
Arch: 64bit(x86_64)
OS Type: Linux
Kernel: 2.6.32-431.5.1.el6.x86_64
OS Version: Red Hat Enterprise Linux Server release
6.5 (Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/57ed23cc-4623-4434-b550-e21723980d1b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/57ed23cc-4623-4434-b550-e21723980d1b%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJQxjPPCtwhWJtGbY6dCU_mU6cyyfh3dgkLEW-0FW%3DH4Ki7LdQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Perhaps it's due to JMeter hitting only one node instead of distrubuting the load across all 4.

Since JMeter 2.12 a brand new test element is available to deal with ELB, CDN, DNS Load balancing, etc. Try adding a DNS Cache Manager to your test plan and ensure that requests are being equally distributed across all endpoint nodes.

See The DNS Cache Manager: The Right Way To Test Load Balanced Apps guide for more detailed explanation and configuration instructions.