Cause and how to avoid "Data too large <http_request>" exception

klimapet · November 12, 2018, 11:12am

Hi,

I've run into troubles with following exception

    org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<http_request>] would be [32136353583/29.9gb], which is larger than the limit of [32127221760/29.9gb]
    	at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:230) ~[elasticsearch-6.2.4.jar:6.2.4]
    	at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-6.2.4.jar:6.2.4]
    	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:232) [elasticsearch-6.2.4.jar:6.2.4]
    	at org.elasticsearch.rest.RestController.tryAllHandlers(RestController.java:336) [elasticsearch-6.2.4.jar:6.2.4]
    	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:174) [elasticsearch-6.2.4.jar:6.2.4]

...

I'm sure our HW setup is quite tight compared to the amount of data we want to index, but still I'd like to understand what is exactly causing this type of error and if there is way how to avoid it.

In the past, there was terribly big index (10+ TB) with shards multiple times bigger, than it is recommended maximum of 50 GB/shard. The new setup has much more shards and the index is is split to several smaller indices. At first, it was fine, but as the amount of indexed data has grown, the very same error started to appear again. There is no cluster, just one Elastic instance running all indexing/querying.

Playing with circuit breakers limits seems to push the error bit further, but it is still there and with more data it is inevitable.

Questions:

What exactly is causing the "Data too large" error on bigger data sets?
What are possible ways of avoiding/preventing it except for extending number of nodes and adding more powerful hardware?

Thanks in advance.

Petr K.

Christian_Dahlqvist · November 12, 2018, 11:20am

Have a look at this webinar for guidance on how to optimise your data in order to get the most out of your cluster.

klimapet · November 13, 2018, 8:31am

Thank you for answering. I´ve watched the video and as I understood, the main benefit for storage density comes from making indices read-only and force merging them down to one segment. Unfortunately this is not possible in our setup, as the indices are never read only. Data is continuously being deleted from them. That is also reason, why we ended up with so many shards in order to keep them <50 GB in size.

Another thing - not using parent/child relations. That was the only way I was able to solve problem with indexing very large documents. Individual documents are so big, that it isn't possible to handle them as one unit and they need to be split into parent (metadata) and many child (content) documents.

Keyword field with high cardinality is only one in whole mapping with approximately 1 M values.

Christian_Dahlqvist · November 13, 2018, 9:16am

Then you probably need to scale out as you need more heap.

klimapet · November 13, 2018, 10:31am

Yes, I thought so.

Anyway, just because I´m curious... This "Data too large" mentions <http_request>. Why? All our REST requests sent against ES are more or less lightweight and the error starts to appear only after fairly big amount of data is stored in indices. I´d understand if it was saying "Fielddata too large" for example. Thank you.

system · December 11, 2018, 10:31am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CircuitBreakingException: [parent] Data too large is coming in ES (7.2.0) Elasticsearch	13	2055	November 22, 2019
CircuitBreakingException Data too large Elasticsearch	4	609	August 14, 2020
Data too large circuit_breaking_exception Elasticsearch	3	4127	July 16, 2019
CircuitBreakingException: [parent] Data too large, data error Elasticsearch	3	846	August 13, 2020
Elasticsearch : circuit_breaking_exception Data too large, data for [<http_request>] would be [419575260/400.1mb], which is larger than the limit of [408420352/389.5mb], real usage Elasticsearch	3	1659	February 9, 2021

Cause and how to avoid "Data too large <http_request>" exception

Related topics