Circuit_breaking_exception during reindex

tomhe · September 5, 2019, 1:03pm

We've started getting frequent circuit breaker exceptions in our cluster, and I need some advice on how to tackle our situation. I've read up on circuit breakers, read lots of posts on the topic, but I'm still not sure what we need to do.

We're running a 9 node cluster at 7.3.1 with each node at 64 GB of OS level ram and 30 GB heap. All nodes run as Docker containers with no custom GC settings.

Here's an example error message from a basic reindex operation:

elasticsearch.exceptions.TransportError: TransportError(429, '{"took":1025273,"timed_out":false,"total":482123,"updated":469000,"created":0,"deleted":0,"batches":469,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[{"shard":-1,"reason":{"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<transport_request>] would be [30609748696/28.5gb], which is larger than the limit of [30601641984/28.5gb], real usage: [30609747960/28.5gb], new bytes reserved: [736/736b], usages [request=0/0b, fielddata=35860/35kb, in_flight_requests=736/736b, accounting=524016769/499.7mb]","bytes_wanted":30609748696,"bytes_limit":30601641984,"durability":"PERMANENT"}}]}')

I've started monitoring the breakers and our problem seems to be with the parent breaker. Here's an example where one of our nodes hits the limit for the parent breaker:

Some questions:

Would our cluster benefit from reducing the max heap size on each node? All nodes use zero-based compressed oops.
Should we add more nodes to the cluster?
Should we use a more aggressive GC?

Any tips on how to fix this would be very helpful.

HenningAndersen · September 13, 2019, 11:03am

Hi @tomhe,

you did state that you used "no custom GC settings", but I still need to ask if you use G1 GC? Since if so, there are adjustments to the GC settings that can help.

Also, can you for completeness include the full exception trace logged for the circuit breaker? And if you have GC logs enabled, seeing those could also be interesting.

tomhe · September 16, 2019, 7:48am

I still need to ask if you use G1 GC? Since if so, there are adjustments to the GC settings that can help.

At the time of my post I was using a jvm.options file that only specified the heap size and nothing else, and I'm not sure which GC is used if none is specified. I have since started using the jvm.options that is shipped with the Docker image, so now I run G1GC like this:

## GC configuration
# -XX:+UseConcMarkSweepGC
# -XX:CMSInitiatingOccupancyFraction=75
# -XX:+UseCMSInitiatingOccupancyOnly

## G1GC Configuration
# NOTE: G1GC is only supported on JDK version 10 or later.
# To use G1GC uncomment the lines below.
10-:-XX:-UseConcMarkSweepGC
10-:-XX:-UseCMSInitiatingOccupancyOnly
10-:-XX:+UseG1GC
10-:-XX:G1ReservePercent=25
10-:-XX:InitiatingHeapOccupancyPercent=30

Full jvm.options:

gist.github.com

https://gist.github.com/tomhe/dbfdca69e0a04bfc6abea85ab3c941e5

jvm.options

## JVM configuration

################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##

This file has been truncated. show original

Also, can you for completeness include the full exception trace logged for the circuit breaker?

I don't have the full exception for exactly the event in my original post, but here is a recent one:

gist.github.com

https://gist.github.com/tomhe/87d1d5872f1295fc6d49022ffa065426

node_breaker.log

{"type": "server", "timestamp": "2019-09-15T10:55:12,822+0000", "level": "DEBUG", "component": "o.e.a.a.c.n.i.TransportNodesInfoAction", "cluster.name": "docker-cluster", "node.
name": "node-005", "cluster.uuid": "i44zLmaER4ipQYj-F9QVDw", "node.id": "5EpJhPhUQv2Mct6vouH12g",  "message": "failed to execute on node [O2SAdwOBSDC6mnwtLG1zRQ]" ,
"stacktrace": ["org.elasticsearch.transport.RemoteTransportException: [node-008][172.24.0.2:9300][cluster:monitor/nodes/info[n]]",
"Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [30820629498/28.7gb], which is larger tha
n the limit of [30601641984/28.5gb], real usage: [30820625344/28.7gb], new bytes reserved: [4154/4kb], usages [request=0/0b, fielddata=21992/21.4kb, in_flight_requests=4560/4.4kb, accounting=586160048/559mb]",
"at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:342) ~[elasticsearch-7.3.1.jar:7.3.1]",
"at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.3.1.jar:7.3.1]",
"at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) [elasticsearch-7.3.1.jar:7.3.1]",
"at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) [elasticsearch-7.3.1.jar:7.3.1]",
"at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) [elasticsearch-7.3.1.jar:7.3.1]",

This file has been truncated. show original

And if you have GC logs enabled, seeing those could also be interesting.

I don't see anything special in the GC logs around the timestamp in above exception. Looks like this:

gist.github.com

https://gist.github.com/tomhe/ab2112de86f55ec6a458be92b493f8be

gc_breaker.log

[2019-09-15T10:55:06.010+0000][1][safepoint     ] Application time: 1.0001793 seconds
[2019-09-15T10:55:06.010+0000][1][safepoint     ] Entering safepoint region: Cleanup
[2019-09-15T10:55:06.010+0000][1][safepoint     ] Leaving safepoint region
[2019-09-15T10:55:06.010+0000][1][safepoint     ] Total time for which application threads were stopped: 0.0002987 seconds, Stopping t
hreads took: 0.0000574 seconds
[2019-09-15T10:55:07.011+0000][1][safepoint     ] Application time: 1.0001500 seconds
[2019-09-15T10:55:07.011+0000][1][safepoint     ] Entering safepoint region: Cleanup
[2019-09-15T10:55:07.011+0000][1][safepoint     ] Leaving safepoint region
[2019-09-15T10:55:07.011+0000][1][safepoint     ] Total time for which application threads were stopped: 0.0003116 seconds, Stopping t
hreads took: 0.0000645 seconds

This file has been truncated. show original

We're still seeing frequent breaker exceptions which is causing quite some troubles for us. Even the Kibana UI fails from time to time because it can't access the .kibana index due to circuit brekaing exceptions.

tomhe · September 18, 2019, 2:36pm

We have also started getting unassigned shards due to circuit breaker exceptions during rerouting:

"unassigned_info" : {
    "reason" : "ALLOCATION_FAILED",
    "at" : "2019-09-18T14:02:55.475Z",
    "failed_allocation_attempts" : 5,
    "details" : "failed shard on node [fzlHKosuRseXxSPur04-Sg]: failed recovery, failure RecoveryFailedException[[doors-daily-004][5]: Recovery failed from {node-007}{pZnexMJIQF-Heq9jTjwi-g}{xh5vQb-rQ72ExvbFfzqRbQ}{10.33.9.75}{10.33.9.75:9300}{di}{ml.machine_memory=67388325888, ml.max_open_jobs=20, xpack.installed=true} into {node-003}{fzlHKosuRseXxSPur04-Sg}{9LQEzjF1QW6_gkKfM7EUog}{10.33.9.92}{10.33.9.92:9300}{dim}{ml.machine_memory=68311584768, xpack.installed=true, ml.max_open_jobs=20}]; nested: RemoteTransportException[[node-007][172.25.0.2:9300][internal:index/shard/recovery/start_recovery]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [32015385438/29.8gb], which is larger than the limit of [30601641984/28.5gb], real usage: [32015384064/29.8gb], new bytes reserved: [1374/1.3kb], usages [request=0/0b, fielddata=98253/95.9kb, in_flight_requests=666487240/635.6mb, accounting=574699439/548mb]]; ",
    "last_allocation_status" : "no_attempt"
  }

HenningAndersen · September 23, 2019, 7:54am

HI @tomhe,

which java version are you using? It could look like a java 8 version (or at least that might explain the symptoms)?

I only mentioned G1 because some users have reported issues using G1 and we are working on finding the necessary configuration to run G1 together with the real memory circuit breaker. Using CMS (the default for Elasticsearch) is still the easiest/safest to stay on the most treaded path.

tomhe · September 23, 2019, 7:58am

We're using the bundled version in the Docker images published by Elastic:

sh-4.2# /usr/share/elasticsearch/jdk/bin/java --version
openjdk 12.0.2 2019-07-16
OpenJDK Runtime Environment (build 12.0.2+10)
OpenJDK 64-Bit Server VM (build 12.0.2+10, mixed mode, sharing)

Would you recommend us to switch to CMS?

HenningAndersen · September 23, 2019, 8:02am

Hi @tomhe,

Maybe you can share the full GC log file so I can take a look first? Feel free to PM me with the file if you prefer that.

tomhe · September 23, 2019, 8:04am

Thanks, I'll send you the log from one of our nodes. We're currently logging like this:

# JDK 9+ GC logging
9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
# due tointernationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise
# time/date parsing will break in an incompatible way for some date patterns and locals
9-:-Djava.locale.providers=COMPAT

Is that sufficient?

HenningAndersen · September 23, 2019, 8:51am

Hi @tomhe,

it would be good to see one of the gc-log files from a node, covering when a circuit break occurred.

Above is fine, though if you have the chance, adding gc+ihop=trace makes sense, that is:

9-:-Xlog:gc*,gc+age=trace,gc+ihop=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m

tomhe · September 23, 2019, 9:23am

Ok, good. Will update the logging configuration on our nodes and send the log to you in a PM.

gall4rdo · October 10, 2019, 8:26am

Any news about this? I have the same issue.

tomhe · October 14, 2019, 11:06am

No, this is still causing problems for us. I am considering switching to CMS.

HenningAndersen · October 15, 2019, 11:01am

A user has reported that increasing G1ReservePercent to 35 has helped out their situation. Could be worth trying out. Main downside is that it will trigger GC a bit more aggressively and thus use more CPU.

Also, if you have log information on the circuit breaker incidents as well as GC logs overlapping that time, I would be interested in seeing them. @tomhe, you already did so for prior incidents, but getting a new set of log files from all nodes of an incident could be good. Feel free to PM log files directly to me.

tomhe · October 17, 2019, 1:22pm

Thanks, I will start by trying with G1ReservePercent at 35 to see how that works.

tomhe · October 21, 2019, 9:51am

I'd like to track the frequency of circuit breaker exceptions across our cluster. What would be the easiest way to do that?

HenningAndersen · October 22, 2019, 7:45pm

I think the easiest route is to feed the logs (preferably the json variant) into a separate ES cluster for further analysis. This could likely be just a single-node cluster. I think this description could work out.

After that is done, you should be able to find a suitable pattern to search for (something like message: CircuitBreakingException is a starting point).

tomhe · October 23, 2019, 7:18am

Thanks. I ended up pulling data from the /_nodes/stats/breaker endpoint and watching the "tripped" counter. Looks to be working, but I don't understand the value of that variable. My assumption was that it would increment with one for each exception, but when an exception occurs I've seen it increase anywhere from 7 to 1451.

tomhe · October 23, 2019, 7:20am

Also, we've added three more nodes to the cluster, from 9 to 12, and now we're only seeing a handful of breaker exceptions per day.

HenningAndersen · October 23, 2019, 7:46am

The tripped counter for the "parent" breaker should only increase once per memory request that is rejected. But it is likely that not all such rejections are reported in log files (many are just reported back to the client). We improved the logging on this, but unfortunately that is not yet out in a release.

Such rejections are likely to occur in bursts until GC freed memory to go under the CB limit.

Increasing cluster capacity will reduce the risk of seeing these.

If you have ES logs with CB exceptions and GC logs covering the timestamp (preferably the whole file, but at least some minutes before and after too), I should be interested in seeing those.

tomhe · October 23, 2019, 9:27am

Will send the logs in a PM.

Topic		Replies	Views
7.4.0 Circuit breaking exceptions Elasticsearch	8	2564	December 17, 2019
Circuit breaking EXception while using Reindex API Elasticsearch	4	749	October 23, 2019
Circuit_breaking_exception Elasticsearch	13	1095	March 3, 2021
Circuit_breaking_exception on read/write Elasticsearch	1	1826	February 14, 2017
Circuit Breaking Exception Elasticsearch	4	863	October 12, 2018

Circuit_breaking_exception during reindex

Related topics