Cause and how to avoid "Data too large <http_request>" exception


#1

Hi,

I've run into troubles with following exception

    org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [<http_request>] would be [32136353583/29.9gb], which is larger than the limit of [32127221760/29.9gb]
    	at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:230) ~[elasticsearch-6.2.4.jar:6.2.4]
    	at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-6.2.4.jar:6.2.4]
    	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:232) [elasticsearch-6.2.4.jar:6.2.4]
    	at org.elasticsearch.rest.RestController.tryAllHandlers(RestController.java:336) [elasticsearch-6.2.4.jar:6.2.4]
    	at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:174) [elasticsearch-6.2.4.jar:6.2.4]

...

I'm sure our HW setup is quite tight compared to the amount of data we want to index, but still I'd like to understand what is exactly causing this type of error and if there is way how to avoid it.

In the past, there was terribly big index (10+ TB) with shards multiple times bigger, than it is recommended maximum of 50 GB/shard. The new setup has much more shards and the index is is split to several smaller indices. At first, it was fine, but as the amount of indexed data has grown, the very same error started to appear again. There is no cluster, just one Elastic instance running all indexing/querying.

Playing with circuit breakers limits seems to push the error bit further, but it is still there and with more data it is inevitable.

Questions:

  • What exactly is causing the "Data too large" error on bigger data sets?
  • What are possible ways of avoiding/preventing it except for extending number of nodes and adding more powerful hardware?

Thanks in advance.

Petr K.


(Christian Dahlqvist) #2

Have a look at this webinar for guidance on how to optimise your data in order to get the most out of your cluster.


#3

Thank you for answering. I´ve watched the video and as I understood, the main benefit for storage density comes from making indices read-only and force merging them down to one segment. Unfortunately this is not possible in our setup, as the indices are never read only. Data is continuously being deleted from them. That is also reason, why we ended up with so many shards in order to keep them <50 GB in size.

Another thing - not using parent/child relations. That was the only way I was able to solve problem with indexing very large documents. Individual documents are so big, that it isn't possible to handle them as one unit and they need to be split into parent (metadata) and many child (content) documents.

Keyword field with high cardinality is only one in whole mapping with approximately 1 M values.


(Christian Dahlqvist) #4

Then you probably need to scale out as you need more heap.


#5

Yes, I thought so.

Anyway, just because I´m curious... This "Data too large" mentions <http_request>. Why? All our REST requests sent against ES are more or less lightweight and the error starts to appear only after fairly big amount of data is stored in indices. I´d understand if it was saying "Fielddata too large" for example. Thank you.