My current ES stack(7.17.4) deployment size is
2 nodes across 2 zones with 240 GB disk and 8 GB RAM with 2 CPU's each.
I have about 4000 shards present in total in the cluster.
This count put the ES hot/data nodes in ES to an unhealthy state with JVM Pressure. My disk space was barely used(10% approx per node).
Each index has exactly 1 - primary shard and 1-replica shard
I could recover and restore the cluster to healthy state only after deleting the unused indices. Could some one explain this behavior on limitations about creating too many indices per cluster per node ?
On monitoring, a better way to identify these issues before , any metrics from elastic cloud to keep an eye on ?