BulkProcessor deadlock on cluster failure

BulkProcessor goes into dead-lock situation when ES (5.5.1) cluster fails during bulk-processing.

If the bulk response gets lost due to failing cluster, the async bulk request handler gets stuck while waiting for the processing semaphore (concurrent-requests) to become available which never gets released.
That leads to blocking all threads accessing the bulk-processor infinitely.

According to this topic: Bulkprocessor Indexing Timeout there is no timeout in that case.

Is there any feasible workaround for getting a timeout to work?

See this extract from thread-dump:

"elasticsearch[vsldocker03_client][bulk_processor][T#1]" - Thread t@17579
java.lang.Thread.State: WAITING
    at sun.misc.Unsafe.park(Native Method)
    - parking to wait for <52f1d565> (a java.util.concurrent.Semaphore$NonfairSync)
    at org.elasticsearch.action.bulk.BulkRequestHandler$AsyncBulkRequestHandler.execute(BulkRequestHandler.java:121)
    at org.elasticsearch.action.bulk.BulkProcessor$Flush.run(BulkProcessor.java:380)
    - locked <***51ea5713***> (a org.elasticsearch.action.bulk.BulkProcessor)

"XXX_Worker-7" - Thread t@290
java.lang.Thread.State: BLOCKED
    at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:287)
    - waiting to lock <***51ea5713***> (a org.elasticsearch.action.bulk.BulkProcessor) owned by "elasticsearch[vsldocker03_client][bulk_processor][T#1]" t@17579
    at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:272)
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.