Parent circuit breaker calculation seems to be wrong with version 7.x

Hi,
After upgrading to ES we started to notice that sometimes search or bulk request fail with:
[parent] Data too large, data for [<transport_request>] would be [3070458198/2.8gb], which is larger than the limit of [3060164198/2.8gb], real usage: [3070457280/2.8gb], new bytes reserved: [918/918b]

Querying our lab node during idle time:
curl 'cmw07:9201/_nodes/ZGJlnzm3Qn-PEcHV9bGz9Q/stats/breaker?human&pretty'
yields:
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "cm-well-dev1",
"nodes" : {
"ZGJlnzm3Qn-PEcHV9bGz9Q" : {
"timestamp" : 1559204491660,
"name" : "cmw07",
"transport_address" : "10.11.11.170:9301",
"host" : "cmw07",
"ip" : "10.11.11.170:9301",
"roles" : [
"data",
"ingest"
],
"attributes" : {
"rack_id" : "cmw07"
},
"breakers" : {
"request" : {
"limit_size_in_bytes" : 1932735283,
"limit_size" : "1.7gb",
"estimated_size_in_bytes" : 0,
"estimated_size" : "0b",
"overhead" : 1.0,
"tripped" : 0
},
"fielddata" : {
"limit_size_in_bytes" : 1288490188,
"limit_size" : "1.1gb",
"estimated_size_in_bytes" : 0,
"estimated_size" : "0b",
"overhead" : 1.03,
"tripped" : 0
},
"in_flight_requests" : {
"limit_size_in_bytes" : 3221225472,
"limit_size" : "3gb",
"estimated_size_in_bytes" : 0,
"estimated_size" : "0b",
"overhead" : 2.0,
"tripped" : 0
},
"accounting" : {
"limit_size_in_bytes" : 3221225472,
"limit_size" : "3gb",
"estimated_size_in_bytes" : 649278,
"estimated_size" : "634kb",
"overhead" : 1.0,
"tripped" : 0
},
"parent" : {
"limit_size_in_bytes" : 3060164198,
"limit_size" : "2.8gb",
"estimated_size_in_bytes" : 2793145280,
"estimated_size" : "2.6gb",
"overhead" : 1.0,
"tripped" : 2666
}
}
}
}
}

As you can see the estimated_size of the parent is huge and isn't the sum of all the other circuit breakers. It seems that the system considers the memory to be almost full even though it's not the case.
It doesn't happen with ES 6.8 where the parent total precisely matches the sum of all the others.
(BTW, it can be reproduced with the docker version, just issue the same curl and see about 200mb taken in parent where all others are 0)

2 Likes

Hi,

since 7.0.0 we use a different strategy for the parent circuit breaker. Please see our blog and the docs for more details.

Daniel

Hi! I faced the same problem. Es 7.1.0

I just try to index by 1-4 docs in a few tests but after couple of tests I start getting errors like this:
ServerError: Type: circuit_breaking_exception Reason: "[parent] Data too large, data for [<http_request>] would be [2073408660/1.9gb], which is larger than the limit of [2040109465/1.8gb], real usage: [2073404752/1.9gb], new bytes reserved: [3908/3.8kb]

Really? I just indexed dozens docs

Hi,

that's an interesting data point. We run benchmarks with Elasticsearch every night where we index millions of documents with different settings (amont them also on a 1GB heap) and we never ran into that issue. Is it possible you share your Elasticsearch configuration? (config/jvm.options and config/elasticsearch.yml). Also do you maybe run memory-intensive aggregations concurrently?

Daniel

Hi Daniel,
thanks for your answer.
I can share my experience:
parent circuit breaker is configured to 95% (real memory).
I am using G1GC with:

  1. 10-:-XX:InitiatingHeapOccupancyPercent=75
  2. bundled JDK so it's JDK 12 and this config is the one that is used

I looked the code and the memory calculation is using getHeapMemoryUsage().
So everything looks ok.
The circuit breaker is actually: "Reject the request if it cannot be done without doing GC". GC should start at 75% and the threshold is 95%. So theoretically it's good.

The issue is that I can see the heap of the process at 2.5GB out of 3GB (more than 75%...) for a lot of time (idle time). When the memory goes to 2.8 and more the requests are rejected.

I have a feeling that after sending some requests to the cluster the memory is released (GC) and everything is back to normal. But it's not automatic and I could see the GC was triggered only 30 seconds after rejecting the request (the system wasn't loaded at that time, the memory use shouldn't be 3GB)
Why isn't the GC triggered? Do you want me to do some special checks?

Thanks!
Eli

Hi Eli,

The garbage collector is triggered when the heap usage reaches 75% -XX:InitiatingHeapOccupancyPercent=75. However, the garbage collector is running concurrently with the application so it can (and will) happen that the application allocates additional heap memory while the heap is being cleaned up. The default garbage collector for Elasticsearch (concurrent mark-sweep (CMS)) also runs concurrently with the application but both garbage collectors have different runtime characteristics so if the concurrent phase takes longer for G1, it will report the higher heap usage until the end of its concurrent cycle and thus the chance is higher that the real memory circuit breaker is tripped. Once the GC cycle finishes, heap usage as seen by the real memory circuit breaker should drop. So you can do a couple of things:

  • Switch to the CMS garbage collector.
  • Stay with G1 but trigger the concurrent cycle sooner (i.e. reducing -XX:InitiatingHeapOccupancyPercent from 75 to a lower value, e.g. 65). Note that this means the garbage collector will run sooner and thus you will see more GC events which might negatively impact performance. It's best that you thoroughly test this change with a representative benchmark before doing this in production.
  • Increase the threshold when the real memory circuit breaker trips by changing indices.breaker.total.limit from 95% to a higher value. In our experiments we've seen though that if we set the threshold too high, we end up in situations where the circuit breaker is ineffective because it triggers too late.
  • Turn off the real memory circuit breaker by setting indices.breaker.total.use_real_memory to false. I'd advise against it because this increases your risk of running into out of memory errors by overloading the node. The real memory circuit breaker enables the node to actually push back without falling over if too much traffic is hitting the node.

Daniel

2 Likes

Hi,

In my previous response I missed that you've mentioned this. This is a bit odd but I suggest you look at the output of the garbage collection log (logs/gc.log). This file contains a lot of details about GC behavior (when was it triggered? how much memory has been reclaimed? how long did it run? etc.) and I recommend this as a starting point for GC analysis in general.

Daniel

I am so relieved someone else ran into this. I thought I was somehow seeing that bug from 5.6 reappear. I was pretty confused when I ran into this and ended up using that workaround from the bug where I set the parent limit_size to, uh, 8 exabytes :scream_cat:. While this did not feel great, the problem went away and I was able to resume indexing documents.

I am also using JDK12 and G1GC with the defaults on a 8GB heap. I'm only bulk indexing docs in a single thread with some occasional mget's. No searches or aggs. GC's are very effective when they run, taking the heap from 75% down to 25%.

It seems like sending a bulk insert when G1GC's heap is near 75% temporarily puts it over the edge, which I think is what Daniel is suggesting here:

I will try reverting to CMS first, and then try setting indices.breaker.total.use_real_memory to false if it's still an issue.

      "breakers" : {
        "request" : {
          "limit_size_in_bytes" : 4986608025,
          "limit_size" : "4.6gb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 3324405350,
          "limit_size" : "3gb",
          "estimated_size_in_bytes" : 3440,
          "estimated_size" : "3.3kb",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "in_flight_requests" : {
          "limit_size_in_bytes" : 8311013376,
          "limit_size" : "7.7gb",
          "estimated_size_in_bytes" : 1860,
          "estimated_size" : "1.8kb",
          "overhead" : 2.0,
          "tripped" : 0
        },
        "accounting" : {
          "limit_size_in_bytes" : 8311013376,
          "limit_size" : "7.7gb",
          "estimated_size_in_bytes" : 51488284,
          "estimated_size" : "49.1mb",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 9223372036854775807,
          "limit_size" : "8192pb",
          "estimated_size_in_bytes" : 6715789312,
          "estimated_size" : "6.2gb",
          "overhead" : 1.0,
          "tripped" : 0
        }
      }```

Hello @e-orz

I got issue same you.
But I forgot increase heap size.
After I increased heap size (-Xms/-Xmx) in jvm.options, issue is resolved.
Good luck for you.

Refer: I got issue about "Data too large"

Hi @thachnv92,
What garbage collection are you using? I have issues only with G1GC and not with CMS (default config is CMS so if you didn't change it, it will be CMS) ?

Hi, @danielmitterdorfer,

Sorry for the late reply, it took me some time to make the tests.

  • Switching to CMS solved the issue. The heap size never got big and the parent circuit breaker was never triggered.

Back to G1GC:
I ran the scenario again, waited for the parent circuit breaker to be triggered and watched the gc.log file. It took it minutes to come back to life. During that time there were not GCs in the gc.log file! It seems that the GC already freed the memory and somehow the process thinks that it's still taken.

I got the feeling you are doing your tests against JDK11 (see this discussion). I replaced the bundled JDK with the 11 version. Things were much better. Even though that the memory was cleaned after 75% it was still cleaned when it ran out of memory and the circuit breaker wasn't triggered. But even with it after several tries it was triggered but only for a short period of time and then the memory was cleaned and the process could process commands.

So my questions are:

  1. Are you using CMS in your internal tests?
  2. Is JDK 12 also used during tests or only JDK 11?
  3. A bit of a different topic, is there a reason why the stats were also affected by the circuit breaker? It makes debugging the issue harder.

Eli

1 Like

hi:
I had the same problem
ES version:7.1.0
use G1GC
Here is my configuration:

-Xms31g
-Xmx31g
10-:-XX:-UseConcMarkSweepGC
10-:-XX:-UseCMSInitiatingOccupancyOnly
10-:-XX:+UseG1GC
10-:-XX:InitiatingHeapOccupancyPercent=75
-Des.networkaddress.cache.negative.ttl=10
-XX:+AlwaysPreTouch
-Xss1m
-Djava.awt.headless=true
-Dfile.encoding=UTF-8
-Djna.nosys=true
-XX:-OmitStackTraceInFastThrow
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Djava.io.tmpdir=${ES_TMPDIR}
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=data
-XX:ErrorFile=/var/log/elasticsearch/logs/hs_err_pid%p.log
8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:/export/Logs/elasticsearch/logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=10
8:-XX:GCLogFileSize=64m
9-:-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/logs/gc.log:utctime,pid,tags:filecount=10,filesize=64m
COMPAT otherwise

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Hi @e-orz and @liuhaoran

I think below options instead of the default -XX:InitiatingHeapOccupancyPercent=75 should solve the problem when using G1 GC:

10-:-XX:G1ReservePercent=25
10-:-XX:InitiatingHeapOccupancyPercent=30

I hope you are willing to try this out and report back the results here?

Hi, @HenningAndersen,
Thanks for your answer. I hope to get to it early next week.

Eli

We have been having the same problem, and it is exceedingly frustrating. We have a cluster with 26 data nodes that with ES 6.8 was able to index well over 10k docs a second. Now a bulk index of about 1k docs per second trips circuit breakers on nodes that have plenty of heap and for which GC runs easily.

We applied these settings last night, and this morning with an almost idle cluster, one thread pushing a couple thousand documents a second was able to cause nodes to throw circuit breakers.

Hi @Jeff_Bolle,

thanks for reporting back here. In order to dig deeper, I hope you can share your elasticsearch.yml, jvm.options and gc log files? Feel free to send them in a PM to me.

@HenningAndersen
We are working on pulling all of those files for you now. I will send them via PM.

We just reverted back to the system JVM:

java --version
openjdk 11.0.4 2019-07-16
OpenJDK Runtime Environment (build 11.0.4+11-post-Ubuntu-116.04.1)
OpenJDK 64-Bit Server VM (build 11.0.4+11-post-Ubuntu-116.04.1, mixed mode, sharing)

And things are running like a dream. Heaps are staying in the mid 60% range at maximum on our most active data nodes, no circuit breakers have tripped, and our objects are flowing out of the backlog.

Hi @Jeff_Bolle,

thanks for the follow up. It could be interesting to know the previous version of java in use (the bundled one?). Also, your ES version is always good information to have as well.

Sounds good that you have it resolved, but would certainly like to dig one step deeper on this one.