High CPU Usage and Load on Data Nodes

sunilsaini314 · November 7, 2019, 10:07am

We have 5 nodes cluster --

3 Master Nodes - 2 Cores, 4 GB RAM - 2 GB Heap
2 Data Nodes - 2 Cores, 16 GB RAM - 8 GB Heap

Total Number of primary Shards -

Data Nodes are constantly having high CPU(~90%-95%) and load average (3-5) but not much memory and heap usage.

Do We need to increase CPU cores or add new data node, please suggest.

geekpete · November 15, 2019, 11:31am

Is there anything else running on your data nodes?
What does your disk io on the data nodes look like?
iostat -x 2 10
or
sar -d 2 10

or even just top and see what % wa (wait) you might have.

Are you doing more indexing or more searching?

sunilsaini314 · November 15, 2019, 11:53am

Hey Peter, Thanks for replying.

iostat -x 2 10

Linux 4.9.51-10.52.amzn1.x86_64 (es-data204.ops.use1d.i.riva.co) 11/15/2019 x86_64 (2 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
57.77 0.00 5.07 7.17 0.06 29.94

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 0.76 0.05 0.88 1.91 25.55 29.58 0.03 32.67 3.05 0.28
xvdf 0.00 589.66 0.75 320.30 174.42 11578.04 36.61 1.14 3.53 0.68 21.96

avg-cpu: %user %nice %system %iowait %steal %idle
39.80 0.00 3.53 9.07 0.00 47.61

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 1.00 0.50 1.00 8.00 16.00 16.00 0.00 0.00 0.00 0.00
xvdf 0.00 537.50 0.00 451.50 0.00 56632.00 125.43 5.01 11.10 0.45 20.40

avg-cpu: %user %nice %system %iowait %steal %idle
75.38 0.00 2.51 2.51 0.00 19.60

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 2.50 0.00 11.00 0.00 128.00 11.64 0.00 0.18 0.18 0.20
xvdf 0.00 426.50 0.00 258.50 0.00 6076.00 23.50 0.22 0.86 0.67 17.40

avg-cpu: %user %nice %system %iowait %steal %idle
34.94 0.00 1.27 9.62 0.00 54.18

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdf 0.00 546.00 0.00 324.50 0.00 12484.00 38.47 1.45 4.46 0.62 20.20

avg-cpu: %user %nice %system %iowait %steal %idle
46.73 0.00 2.51 7.04 0.00 43.72

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdf 0.00 534.50 0.00 257.00 0.00 6916.00 26.91 0.21 0.82 0.63 16.20

avg-cpu: %user %nice %system %iowait %steal %idle
38.58 0.00 4.06 7.87 0.25 49.24

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 1.00 0.00 1.00 0.00 16.00 16.00 0.00 0.00 0.00 0.00
xvdf 0.00 559.00 0.00 252.50 0.00 7108.00 28.15 0.25 0.91 0.70 17.60

avg-cpu: %user %nice %system %iowait %steal %idle
35.70 0.00 3.80 9.62 0.00 50.89

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdf 0.00 569.00 0.00 335.00 0.00 14292.00 42.66 0.54 1.61 0.57 19.00

avg-cpu: %user %nice %system %iowait %steal %idle
35.62 0.00 1.78 8.91 0.00 53.69

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdf 0.00 578.50 0.00 282.00 0.00 7484.00 26.54 0.23 0.82 0.60 16.80

avg-cpu: %user %nice %system %iowait %steal %idle
76.19 0.00 12.78 1.25 0.25 9.52

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 0.50 0.00 1.00 0.00 12.00 12.00 0.00 0.00 0.00 0.00
xvdf 0.00 415.50 0.00 221.00 0.00 9204.00 41.65 0.26 1.18 0.67 14.80

avg-cpu: %user %nice %system %iowait %steal %idle
52.02 0.00 3.03 5.30 0.00 39.65

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdf 0.00 458.00 0.00 234.50 0.00 6080.00 25.93 0.23 0.99 0.72 16.80

We are mostly indexing and very less searching.

geekpete · November 16, 2019, 11:29am

The disk io metrics are not too extreme here.
I'd probably bump cpu cores up by 2 per node here and see how that goes.
Load average of >2 when you only have 2 cores can mean processes are waiting on cpu time but can also mean waiting on io.

Elasticsearch is a very threaded application so more cores can often help.
The io stats don't seem too bad, there's not too much utilisation (%util) or disk queuing generally, but there is spikes to high cpu. If the io stats were any worse then an additional node would provide more disk resource to cater for the indexing workload and might still be a good idea anyway.

The other comment I'd have is with a 5 node cluster, it's probably better to set all nodes as master-data to provide a larger pool of resources for the data to utilise than resources of 3 nodes doing only master duties. Setting all nodes to master/data and setting minimum master nodes accordingly for that master-eligible count is what I'd try first. Dedicated master nodes are more useful at higher node counts where there's more overall node and cluster state to coordinate and manage.

Aiming for a shard count as a low multiple of the number of nodes can avoid some nodes having more shards than others to spread write load more evenly.
For 2 data nodes probably either 1p1r (1 primary, or 2p1r so 2 shards or 4 shards total, spread across 2 nodes. How many indices/index patterns are you indexing to for your "current" indices?
For 5 master/data nodes 5p1r might be good or if you have a few sets of index patterns being indexed to then even 1p1r or 2p1r can be good if all those indices are getting similar amounts of indexing traffic. You basically want to try to spread that write workload evenly across the disk resource you have available if possible for maximum performance in indexing.

system · December 14, 2019, 11:30am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Data node high CPU Elasticsearch	19	3646	February 26, 2018
Help please with high CPU utilization on 1 node of cluster :) Elasticsearch	9	11832	July 5, 2017
High CPU on Data Nodes Elasticsearch	3	491	June 17, 2020
ElasticSearch Nodes go HIGH cpu Elasticsearch	3	908	August 2, 2017
High CPU usage on only 1 Data node Elasticsearch	7	920	October 16, 2020

High CPU Usage and Load on Data Nodes

iostat -x 2 10

Related topics