CircuitBreakingException: [parent] Data too large is coming in ES (7.2.0)

We are seeking the frequent occurence of the CircuitBreakingException in our ES cluster (7.2.0). The stack-trace of the exception is as follows:-

[2019-10-16T00:34:30,850][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [es-data-565858079-3-701791534] failed to execute on node [6R0-yLYrSPi6l2az_0LQ9g]
org.elasticsearch.transport.RemoteTransportException: [es-data-565858085-2-701791749][10.118.18.234:9300][cluster:monitor/nodes/info[n]]
Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<transport_request>] would be [20396991666/18.9gb], which is larger than the limit of [20293386240/18.8gb], real usage: [20396986048/18.9gb], new bytes reserved: [5618/5.4kb]
        at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:343) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) [elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) [elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) [elasticsearch-7.2.0.jar:7.2.0]
        at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) [elasticsearch-7.2.0.jar:7.2.0]

The node in which the above exception was coming, left the cluster. The status of the ES cluster shown in localhost:9200/_cluster/health?format=json&pretty would become red.
Please tell the root-cause and how to avoid it.
Max JVM Heap of the failed data node :- 20 GB (GigaBytes)
RAM of the failed data node :- 110 GB
Index Size on the failed data node :- 51.8 GB

Welcome!

As you have much more RAM available you can first increase the heap size allocated to elasticsearch up to 30gb.

Then may be tell more about your setup, number of nodes, shards, documents and the typical queries you are running?

Hi,

Thanks.
The information about our cluster is as below:-
Number_of_nodes: 18
Number_of_data_nodes: 12
Active_primary_shards:5 Including .kibana_1
Replication Factor: 3
Active_shards:14 (As the replication factor is 3 and the number of data shards is 4, it means 12 and remaining 2 shards corresponding to .kibana_1)
Documents on one data shard: 19 Million (48 GB)

The queries, we are running are as below:-

  1. To query elasticsearch, we are using Painless scripts in which we are using the Update API and update the document using script.
  2. BulkProcessor class is used for bulk operations. Currently, we execute the bulk after every 10 requests.
  3. Read Query: The range query search returns documents based on the date type field (lying with in range)
  4. Index template

Please tell me if you need more information.

Best,
Karan

I guess that this exception is happening while running the "read query", right?
Could you share the query you are running?

Thank you for replying.

Following is the read query :
es.search(index=index_name, scroll = ‘2m’,body={“_source”: [m1_id, m2_id, ,m1_val,m2_val],“query”: { “bool”: { “should”: [{“range”: {m1_update_tstamp: {“gte” : “now-1h”}}},{“range”: {m2_update_tstamp: {“gte” : “now-1h”}}}]}}},request_timeout=300)

Are you working on the same project?

I'm surprised you are getting this message with this request. Are you sure it's related?

Yes we are working on the same project.

Are you sure it's related?

Looking at the query you posted it seems you have not specified any minimum_should_match parameter for your should clause. Is this query giving the expected results? What is the purpose of this query? What is the average size of your documents?

Hi Christian , Thank you for replying to this thread. The query is working as expected. Also the query returns the same result with/without "minimum_should_match" parameter. The document size is less than 5KB. The purpose of this query is to fetch records which are updated in last one hour with filter "gte" : "now-1h"

How much data does the query typically return? How often is it run?

The query returns 10-15K records. The read query run once every hour.

Hello Folks .. Any advice how to debug and find the root cause of this error?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.