Elasticsearch 7.1 have rediculously high disk read io (100%)


We are seeing unexpected write I/O on elasticsearch data nodes.

this is what is captured by iotop:
Screenshot 2020-03-30 at 2.31.39 PM

As you can see, the disk io is 100% all the time, following is captured detailed threads.

Here is the output of Iostat

What we did:

  1. originally we had 3 master nodes, 5 data nodes, 3 coordinator nodes running in k8s version v1.16.6-beta.0 and docker version 18.9.7. All storage disks are premium ssds.
  2. Recently we added 5 data nodes to this cluster.
  3. We have a heavily read index which has 5 primary shards and each has 2 replicas.
  4. It worked find previously however, because of the growth of the size, we are observing performance downgrade very slowly. that's why we are trying to scale up the data nodes.
  5. as you can see, the newly added nodes have rediculously high read disk io. Top 20 threads are elasticsearch.....

Can any1 figure out what can be the potential reason?? thank you so much in advance.

We are observing very wired monitoring pattern

Looks like cache is not working, so it's reading from disk all the time. is it possible it's because I disabled THP and NUMA?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.