TransportBulkAction sets the bulkRequest to null if the index is closed

Phil_Scala · February 26, 2016, 3:37pm

Hello,
I was working on a failure handling scenario, picture a stream of events being written to elasticsearch. If there is a failure, I take the failure and the document and store it in a safe place.

To do this I was using the BulkResponse to get any items that were marked as failed. Then gathering the original document that was attempted form the BulkRequest requests collection. All works fine for our main use case, for a MappingParseException. However if the index was closed when trying to write to it, then I am seeing code in the TransportBulkAction class that sets the request object to null:

private boolean addFailureIfIndexIsUnavailable(DocumentRequest request, BulkRequest bulkRequest,  .... )
....
if(unavailableException != null) {
        Failure failure1 = new Failure(request.index(), request.type(), request.id(), unavailableException);
        String operationType = "unknown";
        if(request instanceof IndexRequest) {
            operationType = "index";
        } else if(request instanceof DeleteRequest) {
            operationType = "delete";
        } else if(request instanceof UpdateRequest) {
            operationType = "update";
        }

        BulkItemResponse bulkItemResponse = new BulkItemResponse(idx, operationType, failure1);
        responses.set(idx, bulkItemResponse);
        // make sure the request gets never processed again
        bulkRequest.requests.set(idx, (Object)null);
        return true;
    } else {
        return false;
    }

This of course essentially wiped out the original document I was trying to write. I have worked around this by holding a map of the documents when building the batch. But I was curious as to why, seems to me you want to retain the original request and mark it attempted or link the failure response to the request (so that its not processed again)

I do see this similar pattern when trying to create an index as well.

Lasty, this is Elasticsearch 2.1.1

Thanks

danielmitterdorfer · March 1, 2016, 9:29am

Hi,

you can find more details in the related Github ticket.

I think it is kind of a gray area: On the one hand, the request is modified internally, on the other hand I think client code should only rely on the response. Even in the success case I'd not reuse the request object in order to get the original document from the id. So I think your Map-based approach is reasonable.

I hope that answered your question.

Daniel

Phil_Scala · March 1, 2016, 5:28pm

Thanks Daniel, this is indeed an interesting gray area, my original test case was against cases where we had a mapping exception (i.e. string trying to be stored in a numeric field) - that though did not set the request instance to null. So there are some inconsistencies, but as you commented (and thanks for that), the map solution has been working fine and will always be reliable.

Topic		Replies	Views
Find failed indexquest in bulk load Elasticsearch	3	724	July 5, 2017
Identify, save and resend failed requests in BulkProcessor Elasticsearch	2	648	April 20, 2017
TransportBulkAction - NullPointerException Elasticsearch	1	340	July 6, 2017
How to get failed data from request when using BulkProcessor? Elasticsearch	1	386	June 19, 2018
Way to re-index failed documents using BulkProcessor Elasticsearch	5	1241	July 5, 2017

TransportBulkAction sets the bulkRequest to null if the index is closed

Related topics