Behaviour of the Java Spark API : JavaEsSpark.saveToEs (background or not)

sebastienf · February 20, 2020, 3:26pm

Hi,
In a Spark Streaming Job, I am using JavaEsSpark.saveToEs to bulk documents into ES indices.
So far so good, but I was wondering about its behaviour.
Indeed, I noticed that the code after the JavaEsSpark call seems to be executed whereas it is not over...

    if (...) {
               
      JavaEsSpark.saveToEs(message, index_pattern+"_{target_index}/{target_type}");
      
    }

    // at last, we commit the offset ranges
    ((CanCommitOffsets) messages.inputDStream()).commitAsync(offsetRanges);

As you can see, I commit kafka offset after saveToEs, but sometimes, saveToEs failed (what so ever issue) and offsets are still commited !

So, if anyone knows sharply the behaviour of this API, I would be pleased

Thanks a lot.

james.baiera · February 27, 2020, 4:28pm

That's pretty peculiar. The saveToEs method simply runs the spark job using the writer implementation to save the data to Elasticsearch. If the RDD fails to complete, the job runner should throw a org.apache.spark.SparkException with information about what caused the run to fail. Can you share what kind of failure you're seeing that still leads to the offsets being committed? Are you sure that the line of code here that commits the offset is the only place that the commit happens?

system · March 26, 2020, 4:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Catching exceptions from saveToEs (elasticsearch-spark) Elasticsearch es-hadoop	5	2426	July 6, 2017
EsHadoopRemoteException: illegal_argument_exception: only write ops with an op_type of create are allowed in data streams? Elasticsearch elastic-stack-monitoring , es-hadoop	5	2419	September 6, 2023
Elastic search SaveJsontoEs Hadoop Libra dropping documents without throwing error or warning Elasticsearch es-hadoop	9	555	February 24, 2023
Spark SQL types are not handled through basic RDD saveToEs() calls Elasticsearch es-hadoop	1	507	May 15, 2021
saveToEs Write performance (elasticsearch-spark) Elasticsearch es-hadoop	3	2728	July 6, 2017

Behaviour of the Java Spark API : JavaEsSpark.saveToEs (background or not)

Related Topics