BulkProcessor goes into dead-lock situation when ES (5.5.1) cluster fails during bulk-processing.
If the bulk response gets lost due to failing cluster, the async bulk request handler gets stuck while waiting for the processing semaphore (concurrent-requests) to become available which never gets released.
That leads to blocking all threads accessing the bulk-processor infinitely.
According to this topic: Bulkprocessor Indexing Timeout there is no timeout in that case.
Is there any feasible workaround for getting a timeout to work?
See this extract from thread-dump:
"elasticsearch[vsldocker03_client][bulk_processor][T#1]" - Thread t@17579
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <52f1d565> (a java.util.concurrent.Semaphore$NonfairSync)
...
at org.elasticsearch.action.bulk.BulkRequestHandler$AsyncBulkRequestHandler.execute(BulkRequestHandler.java:121)
...
at org.elasticsearch.action.bulk.BulkProcessor$Flush.run(BulkProcessor.java:380)
- locked <***51ea5713***> (a org.elasticsearch.action.bulk.BulkProcessor)
"XXX_Worker-7" - Thread t@290
java.lang.Thread.State: BLOCKED
at org.elasticsearch.action.bulk.BulkProcessor.internalAdd(BulkProcessor.java:287)
- waiting to lock <***51ea5713***> (a org.elasticsearch.action.bulk.BulkProcessor) owned by "elasticsearch[vsldocker03_client][bulk_processor][T#1]" t@17579
at org.elasticsearch.action.bulk.BulkProcessor.add(BulkProcessor.java:272)