BulkIngester not reliably executing afterBulk handlers

We're having difficulties adjusting to the new listener thread pool (see pull request #830) with elasticsearch-java version 8.15.0 on Java 17. We're relying on afterBulk handlers for statistics gathering and error handling.

In order to control the scheduler shutdown (instead of stopping actively executing tasks and halting the processing of waiting tasks via shutdownNow()), we're already passing an external scheduler to the ingester. But it appears as if there's room for our shutdown() to be called (i.e., no longer accepting new tasks) before the remaining afterBulk tasks have been submit()ed. The termination always finishes without timing out, yet the handler is occasionally not executed.

How can we ensure an orderly shutdown here that waits for all expected afterBulk tasks to be finished? See commit hbz/limetrans@ca0723f for our latest attempt where we now have to keep track of in-flight bulks via synchronous beforeBulk (which was reverted from async in pull request #837).

See also issue #559 for a similar report; same objective, albeit with the previous non-threaded implementation (with which we didn't have any issues).

Hello and welcome.
It's probably related to this issue that a user found, where the scheduler starts the shutdown before accepting the last tasks, this is more centered on the internal scheduler, but the same logic probably applies. We're working on the problem and hopefully we'll find a solution before the next patch. Thank you!

1 Like

Unfortunately, our issue seems to persist with the 8.15.1 release. Without the countermeasures outlined above, our tests keep failing.

Sorry, that issue took a bit more longer to test and we're just about to merge it. It will be available in 8.15.2, which should be out in a couple of weeks.

Thanks! The issue appears to be fixed with the 8.15.2 release.

1 Like

Unfortunately, the fix is not included in the 8.16.0 release.