High IO Wait in Elasticsearch 5.3 Cluster

Hello,

I have two Elasticsearch Clusters and one of them have 20 instances of Elasticsearch in Amazon AWS (EC2).
We use EBS Volumes.

The problem is in the EC2 Hosts the IO Wait still between 8 and 40 and this is the actual problem, because sometimes I don't can index.

Any Idea?

We don't recommend using EBS.

"EBS" alone isn't exact enough information as there are multiple tiers of EBS. If you absolutely must use EBS you will at least want one of the SSD-based tiers, and ideally "provisioned IOPS". You will still have to be realistic about peak performance, but it might be OK.

You will undoubtedly get the best performance from instance store volumes, which use storage direct-attached to the server where your instance is running. There is however a trade-off here. When you stop an instance completely there is no guarantee that it will be restarted on the same host and the stored data will be lost (this isn't an issue with a reboot, only when the instance is completely stopped).

At this point you may be shocked! I can almost hear you screaming... "I can't lose my data!!!" But this is not really as dangerous as it sounds. While the shards on the instance store would be gone, as long as a replica is still available Elasticsearch will automatically rebuild the shards when the instance is restarted. Sure there is an IO penalty as this occurs, but honestly... how often do you actually stop an instance completely (remember a reboot is OK).

So for the best storage performance in AWS consider instance store volumes. Ensure that all of your indexes are configured with at least one replica. Additional replicas can provide even more resilience. Finally, use snapshots to regularly backup your data to S3... which you should be doing anyway, even when using EBS.

Rob

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.