Garbage Collection not happening, leads to circuit breakers tripping (ES 7.5)

Jonathan_Mendenhall · April 16, 2020, 5:34pm

What we are seeing is out ES 7.5 data nodes tripping "PERMANENT" circuit breakers after running a few fairly simple queries, that the exact same index on ES 6.4 is able to handle without issue.

I've included a bunch of the information we've collected below, but so far as I can tell, GC is just not triggering. Despite us having already applied the recommended G1GC settings (as described in pull requests and referenced in many topics in this forum).

The worst part about these circuit breakers triggering is that 9 times out of 10, we would probably be better off if the node in question simply crashed. In the past, when a circuit breaker tripped during shard recovery, the cluster silently left us with a missing replica until someone noticed and manually triggered a retry. And currently, when these queries trigger a circuit breaker, the affected nodes are just taken out of service for several hours (I'm not sure exactly how, but after this happened yesterday, the nodes eventually come back in). Which not only caused Kibana to crash on startup (with logs mentioning the circuit breakers), but also meant that our queries could still run while some of the nodes were still up, but because nodes were missing, the results were partial.

ES 7.5
OpenJDK 11
OS: Windows
Configured max memory: 30 GB

Java settings: https://gist.githubusercontent.com/Ultraseamus/d97f274db3039c55b8e8c2614ff462df/raw/c65fa872bf859f520c11d75d85d9e214bd5662dc/java%20settings

Node Stats: https://gist.githubusercontent.com/Ultraseamus/d97f274db3039c55b8e8c2614ff462df/raw/c65fa872bf859f520c11d75d85d9e214bd5662dc/Node%20Stats

gc.log: https://gist.githubusercontent.com/Ultraseamus/d97f274db3039c55b8e8c2614ff462df/raw/c65fa872bf859f520c11d75d85d9e214bd5662dc/gc.log%20(GMT)

Node logs: https://gist.githubusercontent.com/Ultraseamus/d97f274db3039c55b8e8c2614ff462df/raw/c65fa872bf859f520c11d75d85d9e214bd5662dc/node%20logs

system · May 14, 2020, 5:34pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Circuit Breaker not triggered even after parent circuit breaker limit is exceeded Elasticsearch	1	445	May 6, 2019
org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [indices:data/write/bulk[s][r]] Elasticsearch	14	8097	August 3, 2021
Parent circuit breaker calculation seems to be wrong with version 7.x Elasticsearch	24	10962	November 4, 2022
Circuit_breaking_exception during reindex Elasticsearch	22	4914	November 20, 2019
Node crash is happening ES 6.8 Elasticsearch	3	423	March 1, 2021

Garbage Collection not happening, leads to circuit breakers tripping (ES 7.5)

Related topics