TransportBulkAction sets the bulkRequest to null if the index is closed


(Phil Scala) #1

Hello,
I was working on a failure handling scenario, picture a stream of events being written to elasticsearch. If there is a failure, I take the failure and the document and store it in a safe place.

To do this I was using the BulkResponse to get any items that were marked as failed. Then gathering the original document that was attempted form the BulkRequest requests collection. All works fine for our main use case, for a MappingParseException. However if the index was closed when trying to write to it, then I am seeing code in the TransportBulkAction class that sets the request object to null:

private boolean addFailureIfIndexIsUnavailable(DocumentRequest request, BulkRequest bulkRequest,  .... )
....
if(unavailableException != null) {
        Failure failure1 = new Failure(request.index(), request.type(), request.id(), unavailableException);
        String operationType = "unknown";
        if(request instanceof IndexRequest) {
            operationType = "index";
        } else if(request instanceof DeleteRequest) {
            operationType = "delete";
        } else if(request instanceof UpdateRequest) {
            operationType = "update";
        }

        BulkItemResponse bulkItemResponse = new BulkItemResponse(idx, operationType, failure1);
        responses.set(idx, bulkItemResponse);
        // make sure the request gets never processed again
        bulkRequest.requests.set(idx, (Object)null);
        return true;
    } else {
        return false;
    }

This of course essentially wiped out the original document I was trying to write. I have worked around this by holding a map of the documents when building the batch. But I was curious as to why, seems to me you want to retain the original request and mark it attempted or link the failure response to the request (so that its not processed again)

I do see this similar pattern when trying to create an index as well.

Lasty, this is Elasticsearch 2.1.1

Thanks


(Daniel Mitterdorfer) #2

Hi,

you can find more details in the related Github ticket.

I think it is kind of a gray area: On the one hand, the request is modified internally, on the other hand I think client code should only rely on the response. Even in the success case I'd not reuse the request object in order to get the original document from the id. So I think your Map-based approach is reasonable.

I hope that answered your question.

Daniel


(Phil Scala) #3

Thanks Daniel, this is indeed an interesting gray area, my original test case was against cases where we had a mapping exception (i.e. string trying to be stored in a numeric field) - that though did not set the request instance to null. So there are some inconsistencies, but as you commented (and thanks for that), the map solution has been working fine and will always be reliable.


(system) #4