High CPU Usage on a few data nodes / Hotspotting of data

Lakshya_Gupta · December 12, 2025, 5:41pm

Haha, true that, but how do I ensure that it’s an IOPS issue

RainTown · December 12, 2025, 5:42pm

Well, if the initial claim:

is not true then, automatically, IOPS is a bottleneck.

The rest of my comments was more on my experience. Which is that often, in a corporate env specifically, or by a vendor, you get told things that are marketing or specs or ... but not actually true in reality.

There seems to me evidence that you are in fact IO bound.

You wrote in that PM exchange (slightly edited)

"Increasing cores from 20 to 32 didn’t help at all, we still capped at 10k upserts per second. Although, reducing the writes load by 90%, we were able to reduce the duration of the cpu spike. Previously the cpu used to be spiked up for hours, but now it down to a couple of minutes, but for us, even those 5-10 minutes become critical."

Maybe I am wrong, but this smells IO bound to me.

The other big brains on the forum might have different view.

Lakshya_Gupta · December 12, 2025, 5:43pm

The query is an upsert query if you intend to ask that. We currently support 10k max writes per second, however, on reaching 10k, we hit 95% cpu on the hot shards/nodes, which causes a decrease in further writes along with a higher read latency. What we aim to achieve is 10k consistent writes with no-cpu spikes, similar to what we experience on the non-hot nodes.

For hot nodes, 10k writes is the peak.

Lakshya_Gupta · December 12, 2025, 5:44pm

Sounds a bit right, but I have very less knowledge about IOPS at the moment to be frank , I’ll need to study a bit on that for now. Can you meanwhile please help me with some debugging steps which I can perform on this cluster to guarantee that it’s an IOPS issue.

Christian_Dahlqvist · December 12, 2025, 5:47pm

Could you please answer my questions around the query load? I just want to check that the routing, which primarily is designed to help at query time, is actually necessary.

Lakshya_Gupta · December 12, 2025, 5:52pm

I didn’t completely understand your question, are you asking if we have specific usecase for this custom routing key?

Christian_Dahlqvist · December 12, 2025, 6:06pm

I am asking for details about how you query and use the data in the cluster, not how you are updating it. This goes back into diving a bit deeper into this issue that was raised early on (and as far as I can see might have been dropped).

Routing is primarily used to make querying more efficient as only a limited number of shards need to be queried to serve each query request. As this was set up a long time ago I was wondering if there is still a strong argument supporting this design decision.

Lakshya_Gupta · December 12, 2025, 6:17pm

We have composite read aggregation query use case and scroll / slice use case.

If we don’t use custom routing, then the requests will go all the shards, that can lead to too many open scroll contexts per node, hence we require routing.

Lakshya-Gupta · December 12, 2025, 6:37pm

@RainTown / @Christian_Dahlqvist could you please help me with this.

RainTown · December 12, 2025, 6:48pm

If there was a /usr/bin/do-i-have-an-iops-issue command, I'd have suggested you run it ages ago. And, even if it existed, it might return "No", or even "are you asking the right question?" !

I am waiting to see the iostat data for when you have "95% CPU usage". For a semi-extended period, like a couple of minutes.

Not just @Christian_Dahlqvist , but others, are all welcome to weigh in with new/different/better ideas.

Lakshya-Gupta · December 12, 2025, 6:52pm

Sure, I’ll add that iostat data as soon as I get the window.

Christian_Dahlqvist · December 12, 2025, 6:52pm

I personally think breaking out the hot IDs into a separate index with a good number of primary shards and no routing on a separate subset of nodes in the cluster would be the best way to resolve the issue. As you however have ruled that out I have a hard time seeing anything but storage performance being the bottleneck. Maybe others have some suggestions though.

The output of iostat -x during a period of heavy load when you are experiencing issues is exactly what we want to see. Please note though that every storage system and node has some limitation and if your storage is indeed very fast and performant there may not be an easy solution that avoids reachitecting the sharding there.

RainTown · December 12, 2025, 9:45pm

Cool. While we wait you could also enquire about the specifics of the IO that is showing as vdb in your linux VMs. There was this exchange:

One possibility is it's a virtual disk carved from a large/huge VMware storage pool, backed by a big brand storage vendor. Which is usually good. But maybe all 60 of your data node VMs all have a virtual disk from exactly same storage pool (just speculation).

Anyways, your 15 hot VMs are all writing to their "looks like a local disk" disk at pretty much the same time, the other 45 too but less so due to skew. Depending on what it is, this might be stressing the storage, or stressing something else. It'll certainly be "doing the best it can", but it's possible you need to know the specific details here. So, if there is a "Storage Management" team, or the "VMware Team", ask them for specifics, and tell them your cluster is very IOPS dependent. If they say 'yeah, we noticed!" that would tell us something in itself

EDIT: Now I recall that /dev/vda and /dev/vdb probably means it's using a paravirt device, which then maps to a native (local) device from the host. So this is all maybe moot. Output of lshw -C disk would be nice to know, if lshw is installed.

Lakshya_Gupta · December 13, 2025, 5:54am

Sure, I’ve re-ran the command, hopefully we get the spike this time.

Lakshya_Gupta · December 13, 2025, 5:56am

*-virtio2

   description: Virtual I/O device

   physical id: 0

   bus info: virtio@2

   logical name: /dev/vda

   size: 10GiB (10GB)

   capabilities: gpt-1.00 partitioned partitioned:gpt

   configuration: driver=virtio_blk guid=3cf094a2-5168-9444-b870-5ddad3b6327a logicalsectorsize=512 sectorsize=4096

*-virtio3

   description: Virtual I/O device

   physical id: 0

   bus info: virtio@3

   logical name: /dev/vdb

   size: 446GiB (478GB)

   capabilities: partitioned partitioned:dos

   configuration: driver=virtio_blk logicalsectorsize=4096 sectorsize=4096

Lakshya_Gupta · December 13, 2025, 6:33am

@RainTown / @Christian_Dahlqvist this is the machine we use https://download.semiconductor.samsung.com/resources/brochure/PM1733%20NVMe%20SSD.pdf

Lakshya_Gupta · December 13, 2025, 6:35am

I might be wrong, but don’t you think if it would have been a “noisy neighbour” problem, then all 60 nodes would have experienced a cpu spike and not only the 15 nodes?

Lakshya_Gupta · December 13, 2025, 6:50am

Since the write IOPS is approximately 135,000 for this instance type, and our total indexing rate is 50k at peak times, don’t you think we should be good wrt IOPS?

RainTown · December 13, 2025, 8:23am

Good morning

should be good - yes.

Are good - TBD

Yes, but this depends on the skew. If 15 nodes need 10k IOPS and the other 45 nodes need 100 IOPS then the other 45 wont spike. But, that was based on a false conjecture, so it's moot now.

RainTown · December 13, 2025, 8:25am

Lakshya_Gupta:

description: Virtual I/O device
   physical id: 0
   bus info: virtio@3
   logical name: /dev/vdb
   size: 446GiB (478GB)
   capabilities: partitioned partitioned:dos
   configuration: driver=virtio_blk logicalsectorsize=4096 sectorsize

So it's not direct pass-thru to the actual device? I'm no VMware expert, but is this the best way (someone else might know)?

Topic		Replies	Views
Elasticsearch 7.17.10 indexing bottleneck on i3.2xlarge and d3.2xlarge nodes in EKS Elasticsearch	53	1606	June 22, 2023
High CPU load Elasticsearch	10	927	May 10, 2022
Elasticsearch 8.9.1 indexing bottleneck on i3.2xlarge and d3.2xlarge nodes in EKS using ECK Elasticsearch	11	976	October 30, 2023
Abnormally high CPU usage for specific queries/dashboards Elasticsearch	2	147	May 15, 2024
Cluster resource usage Elasticsearch	14	454	July 6, 2017

High CPU Usage on a few data nodes / Hotspotting of data

Related topics