Handling failures on saveToES

cerebrotecnologico · January 9, 2018, 12:44am

It has been a while since the thread "Catching exceptions from saveToEs (elasticsearch-spark)" was posted, but apparently there isn't anything new to handle these failures.
Ideally, we would like to fallback into sending record by record instead of a batch of documents to ES and log those that fail.
Failing the entire job is not acceptable.
Could you direct me to the code base of this connector ?

james.baiera · January 11, 2018, 9:35pm

Hi @cerebrotecnologico,

Wanted to pop in here and mention that we are in the process of rewriting how documents are retried in the event of a bulk failure. This is a very complicated re-write as the retry logic was previously super optimized for retrying on specific http codes only without the possibility of the record being changed.

Going forward, we'd prefer to have an interface that allows users to be notified of write failures when they occur, and allow them to either change the document being written, or acknowledge the failure and continue, possibly persisting the record in a queue or on DFS, etc...

You can find the issue for adding these error handlers here. Feel free to leave any feedback you might have!

system · February 8, 2018, 9:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Catching exceptions from saveToEs (elasticsearch-spark) Elasticsearch es-hadoop	5	2442	July 6, 2017
How to handle data that causes failure while indexing from spark to ES Elasticsearch es-hadoop	2	2009	October 10, 2017
Elastic search SaveJsontoEs Hadoop Libra dropping documents without throwing error or warning Elasticsearch es-hadoop	9	557	February 24, 2023
Catch records and erros that don't go into the elasticsearch cluster in another bucket Logstash	6	557	July 31, 2019
Testing error handlers in elasticsearch 6.5 (kafka to ES) Elasticsearch	1	332	January 8, 2019

Handling failures on saveToES

Related topics