I am writing to seek guidance on how to ensure optimal performance of Elasticsearch cluster.
I am running a three node ES cluster with huge resources (16vCPUs, 128GB RAM, 10 TB Disk) on each node.
However, i am collecting logs from 300+ servers and the index is created in the format, (ilm is auto), is index-name-{now/d}-000001. This basically creates an index every day and rolls over index like every 30 days or when it gets to 50G.
however, the indices are growing way too large (more than 3000 shards currently) and I feel like this is affecting the cluster as depicted by constant timeouts when executing some queries or even trying to save some settings like creating new ingest pipelines which always give 504 gateway timeout.
Elasticsearch logs show timeout in connecting to other cluster nodes.
Any idea to optimize my cluster?
I hope i communicated the issue well. Pardon me if I didn't.
Hi @Christian_Dahlqvist
Please see the info below;
What is the full output of the cluster stats API?
Sorry not able to get this output currently but here is the output from the command i executed some time today.
It may also be worthwhile looking into how you handle sharding and index data. If you are actively writing to a significant number of indices and shards this may result in a lot of small writes and resulting IOPS, which may not be ideal for slower disks.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.