Catching exceptions from saveToEs (elasticsearch-spark)

nvitucci · June 23, 2016, 7:53am

Hello,

I am writing an RDD to Elasticsearch using the saveToEs method from elasticsearch-spark. The RDD might contain documents that Elasticsearch rejects with a org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest exception, and I would like to catch the exception(s) in order to just ignore such malformed documents, so that the job does not get interrupted. How can I do this?

james.baiera · June 30, 2016, 7:31pm

Right now there's no great way to do this in es-hadoop. I do recommend using the functionality provided by Spark's RDDs to transform or filter out any invalid documents before executing the final saveToEs. It's unlikely that we would provide options to filter out data when those options are already present in these frameworks.

nvitucci · July 6, 2016, 1:04pm

Well, if there's a failure on the Elasticsearch side, I'd like to be able to fail gracefully - and not to have my whole job fail. In my specific case there is no simple way to do the checks beforehand, so handling the exception would be easier. Do you see any solutions to this? Maybe a saveToEs parameter to "ignore" the exceptions and log them somewhere?

coding2012 · March 8, 2017, 5:17pm

I have also faced some issues with this. The JSON was just fine, it was an invalid date in my case. I had to look at the Elasticsearch logs to find out. I do need a way to run the job and just store errors in a different place so they can be re-run later.

larghir · March 31, 2017, 8:32am

I'd be also interested in this feature. It poses some limitations because it will fail the entire job

Topic		Replies	Views
Handling failures on saveToES Elasticsearch es-hadoop	2	899	February 8, 2018
How to handle data that causes failure while indexing from spark to ES Elasticsearch es-hadoop	2	1980	October 10, 2017
Writing spark Dataframe/Dataset to Elasticsearch Elasticsearch es-hadoop	2	1771	June 27, 2018
Elasticsearch Spark EsHadoopNoNodesLeftException in cluster Mode Elasticsearch	7	7425	July 5, 2017
Elasticsearch and spark Elasticsearch	7	1151	July 6, 2017

Catching exceptions from saveToEs (elasticsearch-spark)

Related topics