It has been a while since the thread "Catching exceptions from saveToEs (elasticsearch-spark)" was posted, but apparently there isn't anything new to handle these failures.
Ideally, we would like to fallback into sending record by record instead of a batch of documents to ES and log those that fail.
Failing the entire job is not acceptable.
Could you direct me to the code base of this connector ?
Wanted to pop in here and mention that we are in the process of rewriting how documents are retried in the event of a bulk failure. This is a very complicated re-write as the retry logic was previously super optimized for retrying on specific http codes only without the possibility of the record being changed.
Going forward, we'd prefer to have an interface that allows users to be notified of write failures when they occur, and allow them to either change the document being written, or acknowledge the failure and continue, possibly persisting the record in a queue or on DFS, etc...
You can find the issue for adding these error handlers here. Feel free to leave any feedback you might have!
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.