Mismatched CPU usages on data nodes

We've got three Elasticsearch data nodes and one master node. One of them generally uses approximately 25-33% the CPU usage of the other two and I can't figure out why. I ran a query yesterday and this one node was at 37% CPU usage while the other two were at almost 100% CPU usage. I JUST refreshed by main dashboard and the one node was at 12% CPU usage while the other two were at 95%.

-- All nodes have the exact same amount (64GB with 32GB locked for Elasticsearch) and type of RAM.
-- When I refreshed my main dashboard, I checked the network switch while the queries were running. All ports are 1Gbps and all were below 1% usage.
-- The CPU on the one with lower usage is different than the other two data nodes (a Xeon E5520 vs Xeon E5504) but that difference is negligible and doesn't explain the problem.
--The configs are as close to being exactly the same as possible.
--All of the nodes have only Java, Elasticsearch and basic apps (i.e. Apptitude, htop, Webmin, ntpdate) installed.
-- The data node that seems faster is using Java version 1.8.0_191-b12 and the other two and the master are using version 1.8.0_151.
-- Shards are spread evenly across all nodes.

My searches sometimes seem slower than expected and I'm wondering if the issue is that the other two hosts ARE actually slower but I can't find a reason why. How can I troubleshoot this further? Any help you can give is greatly appreciated.

Do you have Monitoring enabled?

I don't have the X-Pack installed. I've been wanting to install it so, if you think that will help in tracking this down, I can certainly do that. Also, I forgot to add that we're running 5.2. I found out about Graylog before I found out about the ELK stack, so that's our log collector. The current version of Graylog doesn't support a newer version of Elasticsearch.

Since you're an 'Elastic Team Member', I'll tell you - this is the least helpful message board I've ever seen. I'm a big fan of Elasticsearch but NOT a big fan of their near-useless message board. I see no way to delete my account. Can you at least point me in that direction? Your help would be greatly appreciated.

So you mention: load E5520 is ~37%, E5504 is ~100%.

  • E5520 benchmark (doens't matter what, as long as they're tested in the same way) = 4435.
  • E5504 benchmark = 2702.

So doesn't matter what the config is, there will be a ~60% difference. Furthermore the E5520 has got 8 (virtual) cores, versus 4 of the E5504. If the number of shards of your setup is greater than 4 (= default and very likely for non-default settings), the E5520 wins even more since a shard runs on one core (= more efficient use of the CPU).

Sound like works as designed.

For further analysis: use the same Java version on all systems. What about the RAM, is it running om the same speed on all systems? Apples, pears :slight_smile:

1 Like

Thank you so much for your reply. I used 'cpuboss.com' to compare the CPUs and there didn't seem like much of a difference. I see you used 'passmark.com', which is what I normally use. I used the other site because I could compare them side-by-side. But, the benchmarks certainly seem to explain the issue. Thank you again.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.