My current ES stack(7.17.4) deployment size is
2 nodes across 2 zones with 240 GB disk and 8 GB RAM with 2 CPU's each.
I have about 4000 shards present in total in the cluster.
This count put the ES hot/data nodes in ES to an unhealthy state with JVM Pressure. My disk space was barely used(10% approx per node).
Each index has exactly 1 - primary shard and 1-replica shard
I could recover and restore the cluster to healthy state only after deleting the unused indices. Could some one explain this behavior on limitations about creating too many indices per cluster per node ?
On monitoring, a better way to identify these issues before , any metrics from elastic cloud to keep an eye on ?
After reading the document, As in this scenario only the number of indices (not size/num. of primary shards) was creating the problem, I can infer that my cluster state was too large to handle in the JVM heap space. (Correct me if I'm wrong)
I have autoscaling enabled , I wonder why this was failing to scale . Any thoughts ?