I am using spark streaming to dump data from Kafka to ES and I got the following errors.
org.apache.spark.sql.streaming.StreamingQueryException: Job aborted due to stage failure: Task 6 in stage 4888.0 failed 4 times, most recent failure: Lost task 6.3 in stage 4888.0 (TID 58565, 10.139.64.27, executor 256): org.apache.spark.util.TaskCompletionListenerException: org.elasticsearch.hadoop.rest.EsHadoopRemoteException: circuit_breaking_exception: [parent] Data too large, data for [<http_request>] would be [6104859024/5.6gb], which is larger than the limit of [6103767449/5.6gb], real usage: [6104859024/5.6gb], new bytes reserved: [0/0b]
Could someone suggest how I can adjust some parameters to avoid this?
Here is my es configurations to write to ES.
val esURL = "xxxx"
serviceLogDfForES.writeStream
.outputMode("append")
.format("org.elasticsearch.spark.sql")
.option("es.nodes.wan.only","true")
.option("es.port","9200")
.option("es.net.http.auth.user", "xxx")
.option("es.net.http.auth.pass", "xxx")
.option("checkpointLocation", "/mnt/xxxx/_checkpoint1")
.option("es.net.ssl","true")
.option("es.net.ssl.cert.allow.self.signed", "true")
.option("es.mapping.date.rich", "true")
.option("es.nodes", esURL)
.option("es.resource.write", "service-log-{date}")
.start().awaitTermination()