I'm currently running an Elasticsearch cluster with 6 nodes, and (for every node) the disk usage is around 5.5 TB out of a total disk size of 20 TB. I don't anticipate a significant increase in data storage, and I'm wondering whether it would be better to have a proportional disk size of 10 TB instead of 20 TB for performance optimization.
I would also like to know the recommended disk usage per Elasticsearch node, as well as any best practices for optimizing performance based on disk size and other factors.
Any advice or insights would be greatly appreciated. Thank you in advance for your help!
Hi @yago82,
This really depends based upon your data and retention needs but here are some thoughts:
Generally, the size of the disk wouldn't affect performance. What would matter is spinning disk vs SSD, with SSD being faster and preferred.
You want to keep your nodes away from the cluster watermark limits, and with 5.5TB/20TB used you are doing great there.
ILM (index lifecycle management) is going to be your friend to make sure your shards don't get too big which can degrade performance. For example, you can setup ILM to use rollover index so your indices don't get to large and auto rollover based upon age, or size.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.