Restart on daily basis

mu44ly · May 18, 2020, 1:38pm

Hello everyone,

Recently we got issues with circuit.breaker exception on multiple clusters, right after upgrade from 6.8 to 7.4. It was found that problem can be fixed by jvm option tuning. However, due to other issues in the history of ES usage, we got recommendation, from one of our ES specialist, to do rolling restart of all ES clusters every day, to fix all possible memory problems in future proactively.

Does anyone do such restart in production, what negative impact we can get implementing daily restarts?

polyfractal · May 18, 2020, 4:42pm

A daily restart should not be necessary at all. This sounds like it would cause more problems then solve, since it will force nodes to constantly transfer shard data between themselves as nodes are added/removed from the cluster. This in turn can cause memory pressure on it's own and eat valuable disk IO, and will put pressure on the master node as well as more cluster state updates.

All a restart will do is clear out some temporary heap garbage, but otherwise it will soon grow back to it's "steady state" heap usage. E.g. if you were at 75% heap usage before restart, you'll quickly get back to 75% after restart because that's what the node "needs" in your environment. Similarly, JVM tuning is rarely effective because it's treating the symptom not the cause.

Circuit breakers trip when you are attempting to do something that is "too large" for the cluster. So either your requests need to be optimized, or your cluster is at it's capacity given the size of data and types of requests you are asking of it. In that case you just need to expand the cluster. There are several different types of circuit breakers so it's hard for me to offer a solution, but rolling restarts on a daily basis is not a normal scenario

mu44ly · May 18, 2020, 7:03pm

I agree, daily restarts look abnormal, even if they can help with some unknown memory leak situations. We hit issue with total limit configured for 90%, when simple request _cat/recovery?active_only=true started to return circuit_breaking exception. The bad thing here, is that cluster restart helped it to work few days without problems. Maybe it was a coincidence, but now it is one of the proofs of idea to do daily restarts everywhere.

warkolm · May 18, 2020, 9:58pm

Do you have Monitoring enabled?

mu44ly · May 19, 2020, 5:43am

Yep, we monitor cluster state, heap usage, single replicas, frozen queues, and recently monitoring of circuit_breaking exception was added. During period of activity of this exception, we see some kind of plateau in heap usage around 80%, sometimes 90%, that normalizes after restart. But sometimes it returns back to same plateau very quickly, trying to recover replicas, as ployfractal described.

warkolm · May 19, 2020, 5:48am

Is that external to Elasticsearch, or you use our included monitoring?

mu44ly · May 19, 2020, 5:51am

We use zabbix and its agents + x-pack.

system · June 16, 2020, 5:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
What elasticsearch does when restarting a cluster? Elasticsearch	5	1656	May 13, 2019
Elasticsearch service keep restarting time to time Elasticsearch painless , runtime-fields	2	620	February 15, 2023
Correct way to restart cluster / rejoin failed nodes Elasticsearch	5	1269	July 6, 2017
Why does a node restart reclaim heap? Elasticsearch	2	1203	September 4, 2017
Circuit_breaking_exception during reindex Elasticsearch	22	4914	November 20, 2019

Restart on daily basis

Related topics