Python Elasticsearch API: Error 'data too large' when iterating

I am using the Python Elasticsearch API to interact with my Elastic cluster. I'm getting an error when I try to perform several searches in a for loop. In synthesis, I'm doing the following: I iterate over a list of values. In each iteration, I take a value from the list and formulate a query with it to retrieve a specific set of documents. Using this query I perform a search (client.search(index, query, aggs)) to get a terms aggregation. After some number of iterations, I receive an error of 'data too large'. I'm confident that the result of each aggregation will not retrieve an excessively large number of terms.

I assume there is some kind of buffer that is not emptied after each iteration. I'd appreciate some insights in this matter.

I include the complete error that I get:

BadRequestError(400, 'search_phase_execution_exception', 'task cancelled [Fatal failure during search: failed to merge result [[parent] Data too large, data for [<reduce_aggs>] would be [999285563/952.9mb], which is larger than the limit of [996147200/950mb], real usage: [999285528/952.9mb], new bytes reserved: [35/35b], usages [inflight_requests=952/952b, request=106981446/102mb, fielddata=299829275/285.9mb, eql_sequence=0/0b, model_inference=0/0b]]]')

I've got the same error before when also using for loops. In those cases I partially solved the issue including the creation of the client inside the loop, however, I can't do that in the case that I previously described (also I think is not desirable). I also get an ApiError(429 sometimes)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.