My current ES stack(7.17.4) deployment size is
2 nodes across 2 zones with 240 GB disk and 8 GB RAM with 2 CPU's each.
I have about 4000 shards present in total in the cluster.
This count put the ES hot/data nodes in ES to an unhealthy state with JVM Pressure. My disk space was barely used(10% approx per node).
Each index has exactly 1 - primary shard and 1-replica shard
I could recover and restore the cluster to healthy state only after deleting the unused indices. Could some one explain this behavior on limitations about creating too many indices per cluster per node ?
On monitoring, a better way to identify these issues before , any metrics from elastic cloud to keep an eye on ?
After reading the document, As in this scenario only the number of indices (not size/num. of primary shards) was creating the problem, I can infer that my cluster state was too large to handle in the JVM heap space. (Correct me if I'm wrong)
I have autoscaling enabled , I wonder why this was failing to scale . Any thoughts ?
It looks like you have far too many small shards given the version you are using, which is inefficient. I am not familiar with how autoscaling works so can not help there.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.