Settings to make cluster stable

mario12 · October 6, 2017, 8:11pm

We are occasionally encountering es cluster instability due to non optimized query. Our developers are working on optimizing queries. When the poor performing queries are sent to es, gc time spikes and cpu utilization spikes to 90%. This causes the data node not to respond to the master's ping request. So, few data nodes leave the cluster and rejoins the cluster later.

It takes almost 45 minutes to 1 hour for the cluster to become 'GREEN' due to the replica shard reallocation after data node leaves the cluster.

Currently 'discovery.zen.ping.timeout' is set to default (30 seconds). Also, 'index.unassigned.node_left.delayed_timeout' is set to default (1 minute).

My question is: Until we optimize all the queries, which of the above settings should I tweak to make sure the cluster is stable or at least turn 'GREEN' sooner?

ES version: 1.7

warkolm · October 6, 2017, 8:57pm

The first thing you can do is upgrade, there are a number of improvements around circuit breakers that will prevent these bad queries from even running.

The second and third things you can do are also upgrade

mario12 · October 6, 2017, 9:00pm

We are planning to upgrade es to higher version. We have set the circuit breaker. Following are the settings:

indices.fielddata.cache.size: 13.6GB
indices.fielddata.breaker.limit: 15.0GB

Is there any other settings for circuit breaker other than the above?

Also, will it help to increase the 'discovery.zen.ping.timeout' from 30 seconds to 1 or 2 minutes? Changing this settings will involve es service restart, correct?

system · November 3, 2017, 9:01pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
We are using Elasticsearch version 8.13, We have continuous heavy aggregation queries, it is making whole cluster unstable Elasticsearch elastic-stack-security	6	84	June 5, 2025
How to handle long running queries Elasticsearch	1	1598	July 17, 2018
Cluster hanging on node failure Elasticsearch	2	540	July 6, 2017
ES getting killed by heavy queries Elasticsearch	4	4285	April 13, 2018
How to change the cluster_update_settings time? Elasticsearch	12	6042	October 23, 2017

Settings to make cluster stable

Related topics