BulkWriteErrorHandler (6.2) - write failed documents to HDFS

dr_work · March 30, 2018, 8:43pm

In version 6.2, ErrorHandler functionality was added and it allows to add custom Bulk Error Handlers.

In the documentation, the example shows how to write failed documents to a file using BufferedWriter/OutputStream.

I need to write the failed documents to HDFS. Is it possible to write to HDFS from subclass of BulkWriteErrorHandler?

james.baiera · March 30, 2018, 8:49pm

I don't see why not!

Just an FYI, we are planning on adding more default error handlers as the feature leads up to GA status. An error handler that writes error events to HDFS is one of the ideas we have!

dr_work · March 30, 2018, 9:05pm

Hi James,

I am new to Spark so would appreciate your help. I have been using JavaRDD.saveAsTextFile() to write out text to HDFS.

To convert the document to JavaRDD, I would need access to SparkContext, how can I get access to the SparkContext from my custom BulkWriteErrorHandler?

Maybe there is a better way to accomplish this?

Thanks for your help!

dr_work · April 2, 2018, 9:01pm

Hi James,

Could you provide any pointers on how to write to HDFS from BulkWriteErrorHandler? Is it possible to access SparkContext/SparkSession from BulkWriteErrorHandler?

Thanks

dr_work · April 3, 2018, 6:48pm

The following StackOverflow posting helped me out:

Thanks.

system · May 1, 2018, 6:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
No meaningful logs from bulk write operation failure Elasticsearch es-hadoop	2	2739	July 30, 2018
Can es-hadoop write bulk files to disk? Elasticsearch es-hadoop	2	759	July 6, 2017
Elastic Spark ErrorHandler Elasticsearch es-hadoop	2	812	February 24, 2020
Handling failures on saveToES Elasticsearch es-hadoop	2	919	February 8, 2018
Load data into HDFS using ES-Spark Elasticsearch	2	589	July 6, 2017

BulkWriteErrorHandler (6.2) - write failed documents to HDFS

Related topics