Unusual disk read spike during ES migration

We have 4 ES machines running on AWS under a VPC behind an LB which are currently working without any issues.

We wanted to migrate these 4 ES machines from this older VPC to the newer VPC behind a new LB. So, we followed the following approach:

  1. Boot up 4 new machines from AMI inside new VPC.
  2. Ran our ansible scripts for setting up these 4 machines with ES and configuration as same as the old 4 machines.
  3. We've a huge number of indexes, i.e. date-wise indexes and an all data index. Each index has 12 primary and 12 replicas.
  4. Before adding we had 320 shards distributed in these old 4 machines. Making them 80 shards/machines. After adding these 4 new machines to the existing cluster. Making the shards' distribution to 40 shards/machines. Which is 8 machines in total at this point.
  5. We waited for shard re-allocation to complete.
  6. Once the cluster was green, we ran a sanity script to verify the health of the machines. All good until this point.
  7. Verified the production status. All green.
  8. We switch the Route53 entry from the old LB to point to the new LB and that's when 2 of the new machines started seeing disk read spikes(reads went up to 15 GB/s).

What is in the Elasticsearch logs on those nodes? Which version of Elasticsearch are you using?

ES v6.8.1

There isn't anything in the logs on those nodes that would specify the problem, i.e. no spike in writes to the logs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.