One out of 4 nodes always spikes to 100% CPU when we do some load tests
using JMeter (50 Threads, 50 Loops) with any query (Match_All, Filtered
Query etc.,). That particular node has 3 Shards with 2 Primary Shards. The
other nodes have less than 40% CPU on them at the same time. The heap is
set at 30GB on all of them. This is the GIST for Hot Threads https://gist.github.com/RobloxSai/9f040bbd5ab7b58f2b1d when the Test was
running. Is there anything else that can be done to improve the
performance? The Query Response times jump to 5-8 seconds when the CPU is
hammered.
I had previously posted the specs of the Servers on another thread https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/elasticsearch/P1o_4bVvECA.
Here are the Server Specs: Machine Specs:
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Number of CPU cores: 24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive: Two 278GB SAS Drive configured in
RAID 0 OS:
Arch: 64bit(x86_64)
OS Type: Linux
Kernel: 2.6.32-431.5.1.el6.x86_64
OS Version: Red Hat Enterprise Linux Server release 6.5
(Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).
On Wednesday, June 18, 2014 6:20:58 PM UTC-7, sai...@roblox.com wrote:
One out of 4 nodes always spikes to 100% CPU when we do some load tests
using JMeter (50 Threads, 50 Loops) with any query (Match_All, Filtered
Query etc.,). That particular node has 3 Shards with 2 Primary Shards. The
other nodes have less than 40% CPU on them at the same time. The heap is
set at 30GB on all of them. This is the GIST for Hot Threads https://gist.github.com/RobloxSai/9f040bbd5ab7b58f2b1d when the Test
was running. Is there anything else that can be done to improve the
performance? The Query Response times jump to 5-8 seconds when the CPU is
hammered.
I had previously posted the specs of the Servers on another thread https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/elasticsearch/P1o_4bVvECA.
Here are the Server Specs: Machine Specs:
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Number of CPU cores: 24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive: Two 278GB SAS Drive configured in
RAID 0 OS:
Arch: 64bit(x86_64)
OS Type: Linux
Kernel: 2.6.32-431.5.1.el6.x86_64
OS Version: Red Hat Enterprise Linux Server release 6.5
(Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).
It wouldn't surprise me if both Black Mamba and Slapstick were hitting
100%, they have more shards and have to handle more requests than the
others nodes. But in your case it's only one node.
First, are you http requests evenly spread over the 4 nodes? You could also
check that all your shards are about the same size?
To check if it's an hardware problem I would:
disable the shards rebalacing
stop the cluster
switch the whole data directories from Black Mamba and Slapstick
start the cluster and rerun the benchmark
You'll then see if the problem comes from the 3 shards or the server
itself.
On Wednesday, June 18, 2014 6:20:58 PM UTC-7, sai...@roblox.com wrote:
One out of 4 nodes always spikes to 100% CPU when we do some load tests
using JMeter (50 Threads, 50 Loops) with any query (Match_All, Filtered
Query etc.,). That particular node has 3 Shards with 2 Primary Shards. The
other nodes have less than 40% CPU on them at the same time. The heap is
set at 30GB on all of them. This is the GIST for Hot Threads https://gist.github.com/RobloxSai/9f040bbd5ab7b58f2b1d when the Test
was running. Is there anything else that can be done to improve the
performance? The Query Response times jump to 5-8 seconds when the CPU is
hammered.
I had previously posted the specs of the Servers on another thread https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/elasticsearch/P1o_4bVvECA.
Here are the Server Specs: Machine Specs:
Processor: Intel(R) Xeon(R) CPU E5-2630 0 @
2.30GHz
Number of CPU cores: 24
Number of Physical CPUs: 2
Installed RAM: [~256 GB Total] 128 GB 128 GB 16 MB
Drive: Two 278GB SAS Drive configured in
RAID 0 OS:
Arch: 64bit(x86_64)
OS Type: Linux
Kernel: 2.6.32-431.5.1.el6.x86_64
OS Version: Red Hat Enterprise Linux Server release
6.5 (Santiago)
Java Version: Java 1.7.0_51 (Java 7u51 x64 version for
Linux).
Perhaps it's due to JMeter hitting only one node instead of distrubuting the load across all 4.
Since JMeter 2.12 a brand new test element is available to deal with ELB, CDN, DNS Load balancing, etc. Try adding a DNS Cache Manager to your test plan and ensure that requests are being equally distributed across all endpoint nodes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.