I got into a situation where long running queries on cluster made CPU utilization to go high on all nodes. And cluster went into Red.
We have an application which constantly Indexes documents in Elasticsearch using BulkProcessor. BulkProcessor listners afterBulk method handles failures if any and retries those documents.
During the cluster red scenario, we found that the BulkProcessor listner's afterBulk methods (afterBulk(executionId: Long, request: BulkRequest, failure: Throwable) and afterBulk(executionId: Long, request: BulkRequest, response: BulkResponse)) are not invoked. And hence no retry happened. Ultimately causing data loss.
My question is how to ensure listener's afterBulk is always called even is cluster is in Red state.
Or How bulk processor can know cluster failure and stop excecution