We encountered a dilemma during the setup of a new cluster of ES version 8 with Elastic Cloud for Kubernetes (EKS) operator on AWS with Elastic Block Storage (EBS) as the data hosting service. EBS offers different specialized storage types, such as gp2, gp3, io1 and io2.
Is there a recommendation by the Elastic/community for the most suitable type of storage for an Elasticsearch cluster? Tests performed with different disk types, as well as different IOPS / throughput values, yielded inconclusive results.
The cluster is divided to serving (read) and worker (write) nodes and expected to handle medium to heavy read loads. The cluster will also require several replica shards for each index.
Disk performance is a combination of EBS type and EC2 instance type;
When we had a self-managed cluster running on AWS we used the i3en.* family of instances.
These come with specific disks attached which worked without really any issues for us.
Our setup was a bit different in that we didn't use ECE but "bare" elastic stack.
The issue is that the cluster is not hosted directly on EC2 instances but on Kubernetes using EKS.
Therefore, we're not using the native EC2 storage but claiming volumes on EBS as a separate mechanism in the architecture (otherwise the data will not persist). As a result, we need to determine manually what is the configuration of the storage claims.
The only definite recommendation we can offer is to run benchmarks with your specific dataset and workload. The answer you seek very strongly depends on exactly how you're using the system, there's no one answer that will suit all users. We recommend running your benchmarks with Rally.
If you are getting inconclusive results from your tests then it's entirely possible that the choice of storage doesn't really matter for your use case.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.