We are developing a application, where we ingest and bulk index time-series data using native JAVA client.
Here each and every document/event is very important to us, we can't miss single event.
As we are using BulkProcessor to index the data, in case of any failures to index data with malformed JSON or bulk queue unavailability or any other reason, is there any way to track the failed documents/events from BulkResponse?
I tried iterating through BulkResponse @ afterBulk() method, but couldn't find actual document/event.
Our plan is to index all such failed documents/events to separate INDEX (like unprocessed), which don't consider any mappings.
In the afterBulk method, you should be able to check for BulkResponse.hasFailures(). In case it returns true, you could iterate over response items and index failed ones into your unprocessed index.
I tried your proposal already, but couldn't find a way to get actual document(in this case failed document) with BulkItemResponse. This object is just having id, index and type details, but not actual document.
Oh I see. Something useful is that the response at index i in the response maps to the request at index i in the request, so you can get a reference to the ActionRequest that failed, then cast it to an IndexRequest (if you know it is an IndexRequest) and get the source using the .source() method.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.