CircuitBreakingException: [parent] Data too large IN ES 7.x

LoadingZhang · July 30, 2019, 1:44am

Cluster always get CircuitBreakingException after update to ES7.x, especially running recovery tasks or indexing large data: [internal:index/shard/recovery/start_recovery] or [cluster:monitor/nodes/info[n]], then node left the cluster.
here is log and node stats
After I disable indices.breaker.total.use_real_memory the breaking exception seems not apear again.
Is this question related to this issue?

ywelsch · July 31, 2019, 4:59pm

Yes, the linked issue is related. We're looking into the conditions under which the breaker might trip even though the node could theoretically handle the extra load. This seems to be mostly related to the workload. In your case, best disable the real memory breaker.

LoadingZhang · August 5, 2019, 10:59am

It happens again even disable real memory breaker: [parent] Data too large, data for [<http_request>].
Looks like real memory breaker isn't root reason

ywelsch · August 5, 2019, 11:36am

can you provide the full message? It will tell you information about the different child breakers, which allows to explain where memory is used.

LoadingZhang · August 6, 2019, 9:47am

ElasticsearchStatusException[Elasticsearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [<http_request>] would be [30799676956/28.6gb], which is larger than the limit of [30601641984/28.5gb], real usage: [30760015112/28.6gb], new bytes reserved: [39661844/37.8mb]]]
    at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177)
    at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2053)
    at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2030)
    at org.elasticsearch.client.RestHighLevelClient$1.onFailure(RestHighLevelClient.java:1947)
    at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:857)
    at org.elasticsearch.client.RestClient$1.completed(RestClient.java:560)
    at org.elasticsearch.client.RestClient$1.completed(RestClient.java:537)
    at shaded.org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119)
    at shaded.org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177)
    at shaded.org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:412)
    at shaded.org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:305)
    at shaded.org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:267)
    at shaded.org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81)
    at shaded.org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39)
    at shaded.org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:116)
    at shaded.org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:164)
    at shaded.org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:339)
    at shaded.org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:317)
    at shaded.org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:278)
    at shaded.org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:106)
    at shaded.org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:590)
    at java.lang.Thread.run(Thread.java:748)
    Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://node:9200], URI [/_bulk?timeout=3m], status line [HTTP/1.1 429 Too Many Requests]
{"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [30799676956/28.6gb], which is larger than the limit of [30601641984/28.5gb], real usage: [30760015112/28.6gb], new bytes reserved: [39661844/37.8mb]","bytes_wanted":30799676956,"bytes_limit":30601641984,"durability":"TRANSIENT"}],"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [30799676956/28.6gb], which is larger than the limit of [30601641984/28.5gb], real usage: [30760015112/28.6gb], new bytes reserved: [39661844/37.8mb]","bytes_wanted":30799676956,"bytes_limit":30601641984,"durability":"TRANSIENT"},"status":429}
        at org.elasticsearch.client.RestClient$1.completed(RestClient.java:552)
        ... 16 more

here is node stats: https://del.dog/ibaruginif

ywelsch · August 6, 2019, 10:11am

The error shows that you're still using the real memory circuit breaker (see real usage: [30760015112/28.6gb) whereas you claim you're not?

LoadingZhang · August 6, 2019, 3:02pm

I confirm I have disabled real memory circuit breaker:

GET problem_node:9200/_cluster/settings?include_defaults&flat_settings&local&filter_path=defaults.indices*
{
"defaults": {
"indices.analysis.hunspell.dictionary.ignore_case": "false",
"indices.analysis.hunspell.dictionary.lazy": "false",
"indices.breaker.accounting.limit": "100%",
"indices.breaker.accounting.overhead": "1.0",
"indices.breaker.fielddata.limit": "40%",
"indices.breaker.fielddata.overhead": "1.03",
"indices.breaker.fielddata.type": "memory",
"indices.breaker.request.limit": "60%",
"indices.breaker.request.overhead": "1.0",
"indices.breaker.request.type": "memory",
"indices.breaker.total.limit": "70%",
"indices.breaker.total.use_real_memory": "false",
"indices.breaker.type": "hierarchy",
"indices.cache.cleanup_interval": "1m",
"indices.fielddata.cache.size": "-1b",
"indices.lifecycle.poll_interval": "10m",
"indices.mapping.dynamic_timeout": "30s",
"indices.memory.index_buffer_size": "20%",
"indices.memory.interval": "5s",
"indices.memory.max_index_buffer_size": "6g",
"indices.memory.min_index_buffer_size": "48mb",
"indices.memory.shard_inactive_time": "5m",
"indices.queries.cache.all_segments": "false",
"indices.queries.cache.count": "10000",
"indices.queries.cache.size": "10%",
"indices.query.bool.max_clause_count": "1024",
"indices.query.query_string.allowLeadingWildcard": "true",
"indices.query.query_string.analyze_wildcard": "false",
"indices.recovery.internal_action_long_timeout": "1800000ms",
"indices.recovery.internal_action_timeout": "15m",
"indices.recovery.max_bytes_per_sec": "1024m",
"indices.recovery.max_concurrent_file_chunks": "2",
"indices.recovery.recovery_activity_timeout": "1800000ms",
"indices.recovery.retry_delay_network": "5s",
"indices.recovery.retry_delay_state_sync": "500ms",
"indices.requests.cache.expire": "0ms",
"indices.requests.cache.size": "1%",
"indices.store.delete.shard.timeout": "30s"
}
}

ywelsch · August 7, 2019, 7:25am

How did you disable the real memory circuit breaker? Did you put indices.breaker.total.use_real_memory : false into elasticsearch.yml of all the nodes and restart?

Also, why are you showing the defaults in the settings API call? The default for indices.breaker.total.use_real_memory should be true. The setting needs to be explicitly disabled.

LoadingZhang · August 7, 2019, 7:49am

I have disabled the real memory circuit breaker in all data node except master only node(because of master node will not get this exception).
the indices.breaker.total.use_real_memory default value shows false because this setting set in elasticsearch.yml.

HenningAndersen · August 30, 2019, 8:43am

Hi @LoadingZhang,

are you using the default CMS GC or did you switch to G1 GC?

LoadingZhang · August 31, 2019, 4:40am

yes, I'm using G1GC since ES 5.x

HenningAndersen · September 2, 2019, 1:16pm

Hi @LoadingZhang,

if you can spare the time to try it out, it could be good to check if re-enabling real memory circuit breaker works if you change jvm.options to have:

10-:-XX:G1ReservePercent=25
10-:-XX:InitiatingHeapOccupancyPercent=30

instead of:

10-:-XX:InitiatingHeapOccupancyPercent=75

I would be very interested in knowing the outcome.

LoadingZhang · September 4, 2019, 1:38am

Nodes never get CircuitBreakingException in the last 24 hours, I will take more time to test it, thanks.
BTW, I'm testting ZGC in the mean time, and work well when disable real memory circuit breaker.
I guess -XX:SoftMaxHeapSize in JDK13 would be help to enable real memory circuit breaker.

HenningAndersen · September 4, 2019, 6:47am

Hi @LoadingZhang,

thanks for reporting back on this. Unfortunately, I made a mistake in my original post in that InitiatingHeapOccupancyPercent should really have been set to 30. I have edited my post above to avoid confusion if others read this post.

The JVM should auto-tune this parameter after a while, it only uses the original IHOP value until it has a better estimate of what it should be itself. So your test is certainly still valuable, confirming that the G1ReservePercent does reserve enough heap to avoid circuit breaking in your case.

I have not looked too much into ZGC yet, since it still has experimental status. You will need something like the SoftMaxHeapSize option to make it compatible with real memory circuit breaker. Also, you should notice that ZGC does not support compressed oops, meaning you will likely need more heap since all references need 64 bits rather than 32 bits. This will also in itself lead to some performance degradation (reduced cpu cache efficiency and more data to fetch from RAM).

LoadingZhang · September 9, 2019, 7:50am

CircuitBreakingException appear again, That is such a bad news.
But the exception is not so frequently, anyway.

HenningAndersen · September 9, 2019, 8:14am

Hi @LoadingZhang,

circuit breaking exceptions can occur for legitimate reasons too. Were the cluster/node heavily loaded at the time? In what situation did it occur (recovery, indexing, search etc)?

I hope you can share your ES and GC log files (feel free to PM it to me)?

LoadingZhang · September 9, 2019, 10:31am

Yes, cluster is indexing large data, but It's ok when real memory circuit breaker is disable.
I have PM log to you, if you need more log I will send them again.

Yakir_Gibraltar · September 17, 2019, 9:15am

@HenningAndersen We have the same issue with Zing JDK and ES 7.3.2 (latest).
JVM version (java -version):

java version "11.0.3.0.101" 2019-07-24 LTS
Zing Runtime Environment for Java Applications 19.07.0.0+3 (product build 11.0.3.0.101+12-LTS) Zing 64-Bit Tiered VM 19.07.0.0+3 (product build 11.0.3.0.101-zing_19.07.0.0-b4-product-azlinuxM-X86_64, mixed mode)

ES log:

Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [49671046666/46.2gb], which is larger than the limit of [47173546803/43.9gb],
real usage: [49671045120/46.2gb], new bytes reserved: [1546/1.5kb], usages [request=0/0b, fielddata=8478/8.2kb, in_flight_requests=1546/1.5kb, accounting=7745839/7.3mb]

elasticsearch.yml

cluster.name: dba
discovery.seed_hosts:
- es001.tab.com
- es002.tab.com
- es003.tab.com
network.bind_host: 0.0.0.0
network.host: 0.0.0.0
network.publish_host: es001.tab.com
node.name: es001.tab.com
path.data: "/var/lib/elasticsearch/data/dba"
path.logs: "/var/log/elasticsearch/dba"
xpack.ml.enabled: false
xpack.security.enabled: false
xpack.watcher.enabled: false

jvm.option:

-Dfile.encoding=UTF-8
-Dio.netty.noKeySetOptimization=true
-Dio.netty.noUnsafe=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Djava.awt.headless=true
-Djna.nosys=true
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-XX:+AlwaysPreTouch
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseCMSInitiatingOccupancyOnly
-XX:-OmitStackTraceInFastThrow
-XX:CMSInitiatingOccupancyFraction=75
-Xloggc:/var/log/elasticsearch/dba/gc.log
-Xms50g
-Xmx50g
-Xss1m
-server
-verbose:gc

We are unable to test:

-XX:G1ReservePercent=25
-XX:InitiatingHeapOccupancyPercent=30

Since this options not supported by Zing, Zing JDK not using G1GC

Our Zing conf:

pmem enabled
fundmemory 64G 64G
fund Grant 4G 4G
fund PausePrevention 4G 4G
nodemask	0xFFFFFFFF

Yakir_Gibraltar · October 10, 2019, 11:26am

Just to update, we solved the issue of Azul Zing JDK with -XX:GPGCTargetPeakHeapOccupancyPercent=95
Our jvm.conf right now:

-Dio.netty.noKeySetOptimization=true
-Dio.netty.noUnsafe=true
-Dio.netty.recycler.maxCapacityPerThread=0
-Djava.awt.headless=true
-Djna.nosys=true
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-XX:+AlwaysPreTouch
-XX:+HeapDumpOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow
-XX:GPGCTargetPeakHeapOccupancyPercent=95
-Xloggc:/var/log/elasticsearch/dba/gc.log
-Xms32g
-Xmx32g
-Xss1m
-server
-verbose:gc

system · November 7, 2019, 11:26am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [indices:data/write/bulk[s][r]] Elasticsearch	14	7698	August 3, 2021
CircuitBreakingException: [parent] Data too large is coming in ES (7.2.0) Elasticsearch	13	1949	November 22, 2019
CircuitBreakingException: [parent] Data too large, data error Elasticsearch	3	770	August 13, 2020
"Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Elasticsearch	4	722	December 22, 2020
Data too large circuit breaking exception after migrating to 7.12 Elasticsearch	4	950	May 19, 2021

CircuitBreakingException: [parent] Data too large IN ES 7.x

Related topics