ES Spark Connector - Circuit Breaker error

I'm using ES 7.1.1 and Spark 2.4.2. The ES cluster is on Google Kubernetes Engine and the Spark cluster is on Google Dataproc.

Big jobs are failing with the following error, often several hours into the job:

19/06/25 08:46:15 WARN org.apache.spark.scheduler.TaskSetManager: Lost task 5.0 in stage 2.0 (TID 556, cluster.name, executor 3): org.apache.spark.util.TaskCompletionListenerException: org.elasticsearch.hadoop.rest.EsHadoopRemoteException: circuit_breaking_exception: [parent] Data too large, data for [<http_request>] would be [5127135952/4.7gb], which is larger than the limit of [5067151769/4.7gb], real usage: [5127135952/4.7gb], new bytes reserved: [0/0b]

It then prints the batch request, which is very large.

Any ideas on how to prevent this kind of error? It looks to me like ES is not keeping up with the rate of requests, so memory usage is increasing until requests are rejected.

In this case, it would be nice if Spark slowed down. It looks like retries are enabled, but the back-off time doesn't appear to increase.

Any tips on resolving this problem?

Hi Daniel,

This sort of request could be caused by a number of things, taking a look at the message:

circuit_breaking_exception: [parent] Data too large, data for [<http_request>] would be [5127135952/4.7gb], which is larger than the limit of [5067151769/4.7gb], real usage: [5127135952/4.7gb], new bytes reserved: [0/0b]

So in this case, the "parent" breaker was tripped, the parent breaker is the sum of all the other breakers, the first thing to do in this case is to check the nodes stats API with:

GET /_nodes/stats/breaker?human&pretty

This will return all the breakers for that node, you can then see if any of the other breakers are contributing to the limit causing the breaker to trip.

Next, since this is 7.1, the real memory circuit breaker samples the actual memory usage of ES to try and prevent an OutOfMemoryError, so if the breakers don't tell you where the memory is being used, it may be good to check how large of a request you are sending to ES.

Thanks for your response.

We figured out the issue: the analyzers we were using unnecessarily included a "completion" analyzer that uses a lot of JVM memory for large indices. Removing this analyzer resolved the problem.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.