JVM Pressure on Cluster Nodes with too many indices

My current ES stack(7.17.4) deployment size is
2 nodes across 2 zones with 240 GB disk and 8 GB RAM with 2 CPU's each.
I have about 4000 shards present in total in the cluster.

This count put the ES hot/data nodes in ES to an unhealthy state with JVM Pressure. My disk space was barely used(10% approx per node).

Each index has exactly 1 - primary shard and 1-replica shard

I could recover and restore the cluster to healthy state only after deleting the unused indices. Could some one explain this behavior on limitations about creating too many indices per cluster per node ?

On monitoring, a better way to identify these issues before , any metrics from elastic cloud to keep an eye on ?

Which version of Elasticsearch are you using? I would recommend reading the following blog post on the topic.

Hello @Christian_Dahlqvist Thank you for sharing the doc.

After reading the document, As in this scenario only the number of indices (not size/num. of primary shards) was creating the problem, I can infer that my cluster state was too large to handle in the JVM heap space. (Correct me if I'm wrong)

I have autoscaling enabled , I wonder why this was failing to scale . Any thoughts ?

It looks like you have far too many small shards given the version you are using, which is inefficient. I am not familiar with how autoscaling works so can not help there.