Performance problem because of read IOPS increase

jnegredo · February 12, 2024, 1:46pm

Hi, I've a elasticsearch cluster with 3 nodes (version 6.8.0). It's deployed on 3 virtual machines with 16GB RAM and 4 cores, in Google Cloud.
Usually we have a lot more wirting IOPS than reading (40-50 vs 0-5) without any problem.
On two occasions readings have spiked causing IO wait problems, affecting performance to the point of having to shut down the troubled server. This situation occurred during off-peak hours when the volume of consultations was low.
Is there any way to know what causes these read IOPS peaks?
Thanks in advance.
Javier

BenB196 · February 12, 2024, 3:48pm

Few things here:

6.8.0 is really really old and you should look at upgrading.
What type of Backend storage are you using on GCP? 40-50 IOPS isn't a lot.
Write/Read IOPS ratio don't mean too much, write will always be high if you're writing, but if you have enough RAM, your Read IOPS might be offset by this (or you could just not be searching all that much)
If the IOPS are happening at non-load times, it is possible that Elasticsearch is doing background merging or some other background task.

jnegredo · February 12, 2024, 6:17pm

Unfortunately we have dependencies that, at the moment, do not allow us to upgrade the version.
As for the disks, we were using pd-standard, but we upgraded to balanced disks, to avid this problem actually.
40-50 IOPS it's not much, and is the usual operation, but when we have this problem whe have peaks of 390 IOPS (358 read + 34 write)
What kind of background task may be doing elastic to suddenly scale up to 358 read operatios per second?
Is there any way I can check what operation was elastic doing?

BenB196 · February 12, 2024, 11:34pm

(Disclaimer, I'm not familiar with GCP storage, so just referencing the docs on this area)

we were using pd-standard

That appears to be HDD backed disks, which would definitely have low IOPS and probably not suitable for content/hot nodes in Elasticsearch.

390 IOPS (358 read + 34 write)

Doesn't seem like it should be a problem for pd-balanced.

This IOPS patterns seems like it could be background merging.

Do you have stack monitoring setup? Can you check the number of segments on your cluster around this time? Do they start going down?

There might be some other causes here, but 6.8 is at the point where I don't recall much about it anymore.

jnegredo · February 13, 2024, 9:28am

Thanks for your answer again. With the new disks this shouldn't happen again, but we wanted to track the problem source. I'll take a look to your recommendations

system · March 12, 2024, 9:29am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Periodic spikes in disk read IO leading to degraded performance Elasticsearch	6	3128	November 14, 2017
Updating ELK causes triple write IOPS Elasticsearch	8	1448	July 16, 2018
Elasticsearch read operations very high Elasticsearch	1	402	January 11, 2019
Elastic cluster slow down Elasticsearch	6	574	October 18, 2019
Why disk read is high than disk write? Elasticsearch	6	1998	July 6, 2017

Performance problem because of read IOPS increase

Related topics