Elastic upgrade results in spikes of IOPS


We recently upgraded from v6 to v7.
We hold 400M documents.

in the processor the upgrade we:

  1. reduced document size by half
  2. made schema strict
  3. increased number of shards from 20 to 30
  4. in v6 we used elastic4s client and now we use elastic rest client

in v6 our IOPS is no more than 2k-3k and throughput is a few MB/s
While the traffic RPM remains the same, in v7 we get IOPS that are closer 16k and throughput is rising significantly goes to 450 MB/s

I was hoping you could provide some insights and points to check why are we spiking like this? what parameters we can " play" with? or metrics that we need to examine?

The upgrade was only on the es version side ?
The disk and computes are same in both cases?
Is it SSD ?
Also why did you increase the shard is the number of docs were reduced by half? Ideally you could have decreased the shards a bit if the index size is in range 20-30gb per shard.

The more the shard the scatter gather will be on a higher side and might cause a slight high io but the number is quite high , so we need to look for some other factors to figure out what is causing the spike in io

Thank you for quick your reply.
The reason we increased the shard number was due to the fact that each shard in v6 was "pushing" 90G. Even with the increase in shard number and with reduced document size we are still "pushing" 45G per shard in v7. However, when we went for a larger number of shards (170) we saw degradation performance and a spike in IOPS as well.

We have a network SSD, not an ephemeral SSD.
However, v6 and v7 are the same HW config (we increased the throughput limit in v7 because we were pushing it).

It is important to note that our queries (who have not changed from v6) hit all shards per request. This is due to the fact that we are doing queries across tenants (document Ids).
But again it is the same as in v6.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.