currently I have 5 dedicated k8 worker nodes for ES cluster, each node is 8cpu/ 32G memory.
5 ES pods are deployed through ECK operator on k8, all ES pods have all roles (no separate master, data etc.) eck is excellent in managing ES!
each ES pod has 7cpu and 16G heap size and has 300GB provisioned iops (4500 iops per volume) EBS volume attached.
zipkin sends application traces to ES, but there are many traces dropped because ES write thread pool is rejecting the writes. CPU, heap looks good at ES side but the disk is saturating so ES can't perform well.
I need to increase the iops from 4500 to something higher, eg. 9000 but that is expensive so I am looking for better alternatives because 9000 iops will also hit to its limit at some point.
On the other hand if same thing happens with EBS volumes attached ES cluster, then there will be no (or less) data loss and cluster can be recovered when worker nodes come back online.
Can someone guide and share their experiences on using instance ephemeral store?
If you use shard allocation awareness you can ensure that shard copies are spread across multiple zones. This protects you against all the nodes in a single zone from failing simultaneously, and failures in different zones are (supposed to be) uncorrelated. FWIW I know of some pretty large installations that use instance volumes throughout, relying on allocation awareness to protect against data loss when an instance vanishes.
If you're certain you want to stick with EBS volumes then maybe you would find a hot/warm architecture useful? With your current setup every node needs a volume which is both large and fast, and therefore expensive. If you have multiple tiers then you can save costs by specialising, using small fast disks in the hot tier and large slow disks in the warm tier. ILM can help you implement this.
You might even be able to use instance volumes for the hot tier and EBS volumes for the warm tier. At least that would limit the impact of an improbably large disaster in which you lost instances simultaneously across multiple zones, since you might lose some recent data in a few hot indices but the bulk of your data would be safely stored on warm nodes.
one more question - shard allocation awareness will help in AZ failure but if 2 nodes from diff AZ out of total 5 nodes go down due to AWS underlying h/w failure then in that case there will be data loss correct?
also i haven't separated the nodes by their roles, not sure if separating them will help in terms of performance improvement
Yes, that's correct. It's down to your judgement whether you prefer paying the extra for the high-IOPS EBS volumes or taking the risk on concurrent independent hardware failure.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.