"Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent]

uttam.anand · November 23, 2020, 3:14pm

Hi Guys,

I am facing the below issue since few days now -:

"Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [24876638648/23.1gb], which is larger than the limit of [24481313587/22.7gb], real usage: [24864230864/23.1gb], new bytes reserved: [12407784/11.8mb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=12407784/11.8mb, accounting=25462489/24.2mb]"

Earlier I had this issue , so I looked into the Es discussion forum and found out I need to scale up the memory. Earlier I was using heap as 12 GB but now I doubled it but still getting this issue.

Here is the configuration for ElasticSearch -:

version: '3.4'
services:
elasticsearch:
image: ${REGISTRY}/elastic/elasticsearch:7.3.2-431
environment:
- cluster.name=mgmt-elasticsearch-cluster
- bootstrap.memory_lock=true
- discovery.zen.minimum_master_nodes=3
- cluster.initial_master_nodes=msql07,msql08,msql09,msql10,msql11,msql12
- SERVICE_NAME=elasticsearch
- TAKE_FILE_OWNERSHIP=true
- ES_JAVA_OPTS=-Xms24g -Xmx24g -XX:-UseConcMarkSweepGC -XX:-UseCMSInitiatingOccupancyOnly -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=75
- HOSTNAME_COMMAND=curl -H Metadata:true -s http://169.254.169.254/metadata/instance?api-version=2019-06-04 | jq -r '.compute.name'
labels:
com.bnsf.mp.description: "ElasticSearch database"
com.bnsf.mp.department: "XF"
logging:
driver: "json-file"
networks:
- logging
volumes:
- type: bind
source: /opt/data/elasticsearch
target: /usr/share/elasticsearch/data
- type: tmpfs
target: /usr/share/elasticsearch/logs
deploy:
labels:
traefik.enable: "true"
traefik.port: "9200"
traefik.frontend.rule: "Host:kibana.xyz.com;PathPrefixStrip:/elasticsearch/"
traefik.frontend.entryPoints: "https"
traefik.docker.network: "logging"
mode: global
endpoint_mode: dnsrr
placement:
constraints: [node.labels.type == sql]
resources:
limits:
cpus: '4.0'
memory: 48G
reservations:
cpus: '2.0'
memory: 24G
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
update_config:
parallelism: 1
delay: 60s
failure_action: rollback
monitor: 180s
max_failure_ratio: 0.25

Due to this Circuitbreaker issue, Kibana is also dying after few hours and ES keeps on un-assigning shards due to which cluster status becomes Yellow.

Attaching few monitoring screenshots from Grafana.

jvm.option is below , but I set heap and other few configs through env (ES_JAVA_OPTS=-Xms24g -Xmx24g -XX:-UseConcMarkSweepGC -XX:-UseCMSInitiatingOccupancyOnly -XX:+UseG1GC -XX:InitiatingHeapOccupancyPercent=75) -:

JVM configuration

################################################################

IMPORTANT: JVM heap size

################################################################

You should always set the min and max JVM heap

size to the same value. For example, to set

the heap to 4 GB, set:

-Xms4g

-Xmx4g

See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html

for more information

################################################################

Xms represents the initial size of total heap space

Xmx represents the maximum size of total heap space

-Xms1g
-Xmx1g

################################################################

Expert settings

################################################################

All settings below this section are considered

expert settings. Don't tamper with them unless

you understand what you are doing

################################################################

GC configuration

-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

G1GC Configuration

NOTE: G1GC is only supported on JDK version 10 or later.

To use G1GC uncomment the lines below.

10-:-XX:-UseConcMarkSweepGC

10-:-XX:-UseCMSInitiatingOccupancyOnly

10-:-XX:+UseG1GC

10-:-XX:InitiatingHeapOccupancyPercent=75

DNS cache policy

cache ttl in seconds for positive DNS lookups noting that this overrides the

JDK security property networkaddress.cache.ttl; set to -1 to cache forever

-Des.networkaddress.cache.ttl=60

cache ttl in seconds for negative DNS lookups noting that this overrides the

JDK security property networkaddress.cache.negative ttl; set to -1 to cache

forever

-Des.networkaddress.cache.negative.ttl=10

optimizations

pre-touch memory pages used by the JVM during initialization

-XX:+AlwaysPreTouch

basic

explicitly set the stack size

-Xss1m

set to headless, just in case

-Djava.awt.headless=true

ensure UTF-8 encoding by default (e.g. filenames)

-Dfile.encoding=UTF-8

use our provided JNA always versus the system one

-Djna.nosys=true

turn off a JDK optimization that throws away stack traces for common

exceptions because stack traces are important for debugging

-XX:-OmitStackTraceInFastThrow

flags to configure Netty

-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0

log4j 2

-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true

-Djava.io.tmpdir=${ES_TMPDIR}

heap dumps

generate a heap dump when an allocation from the Java heap fails

heap dumps are created in the working directory of the JVM

-XX:+HeapDumpOnOutOfMemoryError

specify an alternative path for heap dumps; ensure the directory exists and

has sufficient space

-XX:HeapDumpPath=data

specify an alternative path for JVM fatal error logs

-XX:ErrorFile=logs/hs_err_pid%p.log

JDK 8 GC logging

8:-XX:+PrintGCDetails
8:-XX:+PrintGCDateStamps
8:-XX:+PrintTenuringDistribution
8:-XX:+PrintGCApplicationStoppedTime
8:-Xloggc:logs/gc.log
8:-XX:+UseGCLogFileRotation
8:-XX:NumberOfGCLogFiles=32
8:-XX:GCLogFileSize=64m

JDK 9+ GC logging

9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m

due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise

time/date parsing will break in an incompatible way for some date patterns and locals

9-:-Djava.locale.providers=COMPAT

Can someone help me to resolve this issue?

uttam.anand · November 23, 2020, 3:58pm

The logs from Elastic Search -:
indent preformatted text by 4 spaces
logging_elasticsearch.0.ta3g4ahn6y27@msql12 logging_elasticsearch.0.7l3wcaf4h0wk@msql10 logging_elasticsearch.0.ta3g4ahn6y27@msql12 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.b79xn1o5bdc9@msql08 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.7l3wcaf4h0wk@msql10 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.7l3wcaf4h0wk@msql10 logging_elasticsearch.0.ta3g4ahn6y27@msql12 logging_elasticsearch.0.ta3g4ahn6y27@msql12 logging_elasticsearch.0.b79xn1o5bdc9@msql08 logging_elasticsearch.0.ta3g4ahn6y27@msql12 logging_elasticsearch.0.f29yl4yyqf1o@msql09 logging_elasticsearch.0.ta3g4ahn6y27@msql12 logging_elasticsearch.0.f29yl4yyqf1o@msql09 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.f29yl4yyqf1o@msql09 logging_elasticsearch.0.ylfypgoewi8z@msql07 logging_elasticsearch.0.ecr8s2xr26c3@msql11 logging_elasticsearch.0.ta3g4ahn6y27@msql12 logging_elasticsearch.0.ta3g4ahn6y27@msql12 error from daemon in stream: Error | {"type": "server", "timestamp": "2020-11-22T09:00:25,361+0000", "level": "DEBUG", "component": "o.e.a.s.TransportSearchAction", "cluster.name": "mgmt-elasticsearch-cluster", "node.name": "msql12", "cluster.uuid": "IOHuA3EiTqyG-inhZ2ZRcg", "node.id": "tR8aAZbYS-GV4Mj1a2-mCA", "message": "[mp-prod-event-moves-2020.11.22][0], node[v6QeDv1RT2G6f8CPB6p53g], [P], s[STARTED], a[id=b31RPy-oRKWLfSgeT3OQRw]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[mp-prod-event-], indicesOptions=IndicesOptions[ignore_unavailable=true, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[], routing='null', preference='1605876857124', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=6, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"size":0,"timeout":"30000ms","query":{"bool":{"must":[{"query_string":{"query":"CLRSTS","fields":[],"type":"best_fields","default_operator":"or","max_determinized_states":10000,"enable_position_increments":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"phrase_slop":0,"analyze_wildcard":true,"time_zone":"America/Chicago","escape":false,"auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}},{"bool":{"should":[{"match_phrase":{"message":{"query":"\"requestStatus\":\"C\"","slop":0,"zero_terms_query":"NONE","boost":1.0}}},{"match_phrase":{"message":{"query":"\"requestStatus\":\"L\"","slop":0,"zero_terms_query":"NONE","boost":1.0}}}],"adjust_pure_negative":true,"minimum_should_match":"1","boost":1.0}},{"range":{"@timestamp":{"from":"2020-11-20T12:30:00.000Z","to":"2020-11-20T20:30:00.000Z","include_lower":true,"include_upper":true,"format":"strict_date_optional_time","boost":1.0}}}],"filter":[{"match_all":{"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"_source":{"includes":[],"excludes":[]},"stored_fields":"","docvalue_fields":[{"field":"@timestamp","format":"date_time"}],"script_fields":{},"track_total_hits":2147483647,"aggregations":{"2":{"filters":{"filters":{"Havre East DS Lined":{"bool":{"must":[{"query_string":{"query":"CLRSTS AND \"NOCDISP-NC002\" NOT \"REQUESTID\"","fields":,"type":"best_fields","default_operator":"or","max_determinized_states":10000,"enable_position_increments":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"phrase_slop":0,"escape":false,"auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"Havre East MP Lined":{"bool":{"must":[{"query_string":{"query":"CLRSTS AND \"NOCDISP-NC002\" AND \"REQUESTID\"","fields":,"type":"best_fields","default_operator":"or","max_determinized_states":10000,"enable_position_increments":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"phrase_slop":0,"escape":false,"auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}}},"other_bucket":false,"other_bucket_key":"other"}}}}}] lastShard [true]" ,
| "at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]",
| "stacktrace": ["org.elasticsearch.transport.RemoteTransportException: [msql08][192.168.2.19:9300][indices:data/read/search[can_match]]",
| "at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) [netty-transport-4.1.36.Final.jar:4.1.36.Final]",
| "at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) [netty-transport-4.1.36.Final.jar:4.1.36.Final]",
| "at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) [elasticsearch-7.3.2.jar:7.3.2]",
| "at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.36.Final.jar:4.1.36.Final]",
| "at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) [netty-common-4.1.36.Final.jar:4.1.36.Final]",
| "at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.36.Final.jar:4.1.36.Final]",
| "at java.lang.Thread.run(Thread.java:835) [?:?]"] }
| {"type": "server", "timestamp": "2020-11-23T04:52:40,732+0000", "level": "WARN", "component": "o.e.i.e.Engine", "cluster.name": "mgmt-elasticsearch-cluster", "node.name": "msql07", "cluster.uuid": "IOHuA3EiTqyG-inhZ2ZRcg", "node.id": "TPvt3m18TteOrNLZx4KKRA", "message": " [mp-uat1-infra-cassandra-2020.11.23][0] failed engine [primary shard [[mp-uat1-infra-cassandra-2020.11.23][0], node[TPvt3m18TteOrNLZx4KKRA], [P], s[STARTED], a[id=AJ7y7NPSQpCLCg23tqv6Tg]] was demoted while failing replica shard]" ,
| "stacktrace": ["org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [24765919806/23gb], which is larger than the limit of [24481313587/22.7gb], real usage: [24765912464/23gb], new bytes reserved: [7342/7.1kb], usages [request=0/0b, fielddata=609/609b, in_flight_requests=7342/7.1kb, accounting=36423318/34.7mb]",
| "at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:342) ~[elasticsearch-7.3.2.jar:7.3.2]",
| "at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.3.2.jar:7.3.2]",
| "at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) [elasticsearch-7.3.2.jar:7.3.2]",
| "at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) [netty-transport-4.1.36.Final.jar:4.1.36.Final]",
| "at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) [elasticsearch-7.3.2.jar:7.3.2]",
| "at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]",
| "Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [24746112742/23gb], which is larger than the limit of [24481313587/22.7gb], real usage: [24746110976/23gb], new bytes reserved: [1766/1.7kb], usages [request=0/0b, fielddata=675/675b, in_flight_requests=3530/3.4kb, accounting=36806685/35.1mb]",
| "at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:342) ~[elasticsearch-7.3.2.jar:7.3.2]",
| "at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) [elasticsearch-7.3.2.jar:7.3.2]",
| "at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.3.2.jar:7.3.2]",
| "at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]",
| "at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) [elasticsearch-7.3.2.jar:7.3.2]",
| "at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]",
| "at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) [elasticsearch-7.3.2.jar:7.3.2]",
| "at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]",
| "at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) [elasticsearch-7.3.2.jar:7.3.2]",
| "stacktrace": ["org.elasticsearch.transport.RemoteTransportException: [msql08][192.168.2.19:9300][indices:data/read/search[phase/query]]",
| "at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) [elasticsearch-7.3.2.jar:7.3.2]",
| "at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) [elasticsearch-7.3.2.jar:7.3.2]",
grabbing logs: rpc error: code = Unavailable desc = transport is closingPreformatted text

warkolm · November 23, 2020, 11:44pm

Welcome to our community! We aren't all guys though.

Please format your code/logs/config using the </> button, or markdown style back ticks. It helps to make things easy to read which helps us help you

uttam.anand · November 24, 2020, 4:24am

Hey Mark,

I tried formatting it using the </> button. I don't see any changes.

Can you please help me in resolving this issue.

I am now using the following parameters for jvm -:

ES_JAVA_OPTS=-Xms27g -Xmx27g -XX:-UseConcMarkSweepGC -XX:-UseCMSInitiatingOccupancyOnly -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30

The number of occurrences of circuit breaker exception has reduced since last 12 hours and the shards are not getting unassigned.

But, Kibana instances once died off due to this exception.
Please suggest what am I missing here.

Thanks
Uttam

system · December 22, 2020, 4:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CircuitBreakingException: [parent] Data too large Elasticsearch	5	773	October 19, 2021
CircuitBreaker: [parent] Data too large, data for [<transport_request>] Elasticsearch	2	1669	September 5, 2019
CircuitBreakingException [parent] Data too large in 7.4.2 Elasticsearch	2	2373	May 19, 2020
Elasticsearch cluster crash due to circuit breakdown issue Elasticsearch	1	547	June 12, 2020
Kibana CircuitBreakingException Elasticsearch	2	798	February 19, 2020

"Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent]

JVM configuration

IMPORTANT: JVM heap size

You should always set the min and max JVM heap

size to the same value. For example, to set

the heap to 4 GB, set:

-Xms4g

-Xmx4g

See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html

for more information

Xms represents the initial size of total heap space

Xmx represents the maximum size of total heap space

Expert settings

All settings below this section are considered

expert settings. Don't tamper with them unless

you understand what you are doing

GC configuration

G1GC Configuration

NOTE: G1GC is only supported on JDK version 10 or later.

To use G1GC uncomment the lines below.

10-:-XX:-UseConcMarkSweepGC

10-:-XX:-UseCMSInitiatingOccupancyOnly

10-:-XX:+UseG1GC

10-:-XX:InitiatingHeapOccupancyPercent=75

DNS cache policy

cache ttl in seconds for positive DNS lookups noting that this overrides the

JDK security property networkaddress.cache.ttl; set to -1 to cache forever

cache ttl in seconds for negative DNS lookups noting that this overrides the

JDK security property networkaddress.cache.negative ttl; set to -1 to cache

forever

optimizations

pre-touch memory pages used by the JVM during initialization

basic

explicitly set the stack size

set to headless, just in case

ensure UTF-8 encoding by default (e.g. filenames)

use our provided JNA always versus the system one

turn off a JDK optimization that throws away stack traces for common

exceptions because stack traces are important for debugging

flags to configure Netty

log4j 2

heap dumps

generate a heap dump when an allocation from the Java heap fails

heap dumps are created in the working directory of the JVM

specify an alternative path for heap dumps; ensure the directory exists and

has sufficient space

specify an alternative path for JVM fatal error logs

JDK 8 GC logging

JDK 9+ GC logging

due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise

time/date parsing will break in an incompatible way for some date patterns and locals

Related topics