Behaviour of the Java Spark API : JavaEsSpark.saveToEs (background or not)

Hi,
In a Spark Streaming Job, I am using JavaEsSpark.saveToEs to bulk documents into ES indices.
So far so good, but I was wondering about its behaviour.
Indeed, I noticed that the code after the JavaEsSpark call seems to be executed whereas it is not over...

    if (...) {
               
      JavaEsSpark.saveToEs(message, index_pattern+"_{target_index}/{target_type}");
      
    }

    // at last, we commit the offset ranges
    ((CanCommitOffsets) messages.inputDStream()).commitAsync(offsetRanges);

As you can see, I commit kafka offset after saveToEs, but sometimes, saveToEs failed (what so ever issue) and offsets are still commited !

So, if anyone knows sharply the behaviour of this API, I would be pleased :slight_smile:

Thanks a lot.

That's pretty peculiar. The saveToEs method simply runs the spark job using the writer implementation to save the data to Elasticsearch. If the RDD fails to complete, the job runner should throw a org.apache.spark.SparkException with information about what caused the run to fail. Can you share what kind of failure you're seeing that still leads to the offsets being committed? Are you sure that the line of code here that commits the offset is the only place that the commit happens?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.