We are occasionally encountering es cluster instability due to non optimized query. Our developers are working on optimizing queries. When the poor performing queries are sent to es, gc time spikes and cpu utilization spikes to 90%. This causes the data node not to respond to the master's ping request. So, few data nodes leave the cluster and rejoins the cluster later.
It takes almost 45 minutes to 1 hour for the cluster to become 'GREEN' due to the replica shard reallocation after data node leaves the cluster.
Currently 'discovery.zen.ping.timeout' is set to default (30 seconds). Also, 'index.unassigned.node_left.delayed_timeout' is set to default (1 minute).
My question is: Until we optimize all the queries, which of the above settings should I tweak to make sure the cluster is stable or at least turn 'GREEN' sooner?
ES version: 1.7