ESRejectedExecution followed by Version Conflict Error for Bulk Requests

ES Version: 2.3.2
Nest Version: 2.3.2
OS: Windows Core 2012 R2

Hi,

We're sending bulk index requests with default thread pool settings and queue size. Within the client code we check for itemsWithErros and retry the bulk operation. Suprisingly, for the items that got rejected by Queue capacity error, threw version conflict error on second try.

Our understanding is, if the bulk request is rejected that means it never reached to ES and hence the version is not incremented. After digging the code a little bit, I found BackOffPolicy.java that is set within BulkRequestHandler class. The back off policy has maxNumberOfRetries as one of the argument within it's constructor and other being Time.

We also do set MaximumRetries(5) method on ConnectionSettings in Elasticsearch NEST. I was just wondering whther this setting results in internal retry logic while doing bulk requests?

If this is the case, then the initial bulk document that threw ESRejectedException would have gone through by Retry method? If yes, then it makes sense to get version conflict error when sending the same failed items again to ES.

Please clarify.

Hi @sumithub,

Within the client code we check for itemsWithErros and retry the bulk operation.

Do you retry the whole bulk request or just the items that have failed? You should only retry the items that have failed and also check the failure reason.

I found BackOffPolicy.java that is set within BulkRequestHandler class

This is part of the Java client API which does not affect you since you use the .NET client.

We also do set MaximumRetries(5) method on ConnectionSettings in
Elasticsearch NEST. I was just wondering whther this setting results in
internal retry logic while doing bulk requests?

No. MaximumRetries just applies to connection issues. I also spoke to the team that implements the .NET client and they told me that the 5.x and 2.5 client will have a bulk helper that has a separate back off for retriable failures, so such a feature is coming.

Daniel

Thanks @danielmitterdorfer for the clarification. Yes we only reprocess the items with errors only. I probably need to double check our retry logic code and run the test again. The information provided is quite helpful for further debugging.

Cheers,
Sumit