Maximizing java write efficiency

Hi all --

I couldn't find this information in the documentation and wondered if anyone else had a sense. I'm going to be writing to elasticsearch through a spark java application.

I would think the top 3 options are:

  1. JavaEsSparkStreaming class
    1. JavaEsSpark class
  2. Bulk API

My sense would be that performance would be better if the JavaEsSparkStreaming API is used instead of the JavaEsSpark API for writes, based on the fact that with the former, I define my stream and simply hand it over to the API, and with the latter, I take each RDD and pass them one at a time (for each spark executor). Is this wrong, or maybe insignificant?

In either case, would using the bulk API actually be faster for write speed?

Thanks for any help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.