BulkWriteErrorHandler (6.2) - write failed documents to HDFS

In version 6.2, ErrorHandler functionality was added and it allows to add custom Bulk Error Handlers.

In the documentation, the example shows how to write failed documents to a file using BufferedWriter/OutputStream.

I need to write the failed documents to HDFS. Is it possible to write to HDFS from subclass of BulkWriteErrorHandler?

I don't see why not!

Just an FYI, we are planning on adding more default error handlers as the feature leads up to GA status. An error handler that writes error events to HDFS is one of the ideas we have!

Hi James,

I am new to Spark so would appreciate your help. I have been using JavaRDD.saveAsTextFile() to write out text to HDFS.

To convert the document to JavaRDD, I would need access to SparkContext, how can I get access to the SparkContext from my custom BulkWriteErrorHandler?

Maybe there is a better way to accomplish this?

Thanks for your help!

Hi James,

Could you provide any pointers on how to write to HDFS from BulkWriteErrorHandler? Is it possible to access SparkContext/SparkSession from BulkWriteErrorHandler?

Thanks

The following StackOverflow posting helped me out:

Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.