BulkIngester not reliably executing afterBulk handlers

We're having difficulties adjusting to the new listener thread pool (see pull request #830) with elasticsearch-java version 8.15.0 on Java 17. We're relying on afterBulk handlers for statistics gathering and error handling.

In order to control the scheduler shutdown (instead of stopping actively executing tasks and halting the processing of waiting tasks via shutdownNow()), we're already passing an external scheduler to the ingester. But it appears as if there's room for our shutdown() to be called (i.e., no longer accepting new tasks) before the remaining afterBulk tasks have been submit()ed. The termination always finishes without timing out, yet the handler is occasionally not executed.

How can we ensure an orderly shutdown here that waits for all expected afterBulk tasks to be finished? See commit hbz/limetrans@ca0723f for our latest attempt where we now have to keep track of in-flight bulks via synchronous beforeBulk (which was reverted from async in pull request #837).

See also issue #559 for a similar report; same objective, albeit with the previous non-threaded implementation (with which we didn't have any issues).

Hello and welcome.
It's probably related to this issue that a user found, where the scheduler starts the shutdown before accepting the last tasks, this is more centered on the internal scheduler, but the same logic probably applies. We're working on the problem and hopefully we'll find a solution before the next patch. Thank you!

1 Like