Is it possible to perform bulk insert from Spark to ElasticSearch?


(diplomaticguru) #1

Is it possible perform bulk insert from Spark to ElasticSearch?

At the moment, I'm using the 'saveToEsWithMeta' method for upserting the data(JavaPairRDD). Is there a way to bulk insert using the _bulk API? Are there any example that I could take a look?


(Costin Leau) #2

All the writes in Elasticsearch-Hadoop (including Spark) are done using the
bulk API underneath (through the REST protocol and thus use the _bulk
endpoint). Whether you saving 1, 100 or 10K, the procedure is the same.
Btw, I recommend spending some time reading the whole reference
documentation as it covers the architecture pretty well and provides plenty
of examples.


(diplomaticguru) #3

Thank you @costin for your reply. I'll check out the document but were you referring to this; https://github.com/elastic/elasticsearch-hadoop/blob/master/docs/src/reference/asciidoc/core/spark.adoc


(Costin Leau) #4

@diplomaticguru Why are you looking at the source and not the official, rendered doc which is available here? The docs are mentioned in the Github readme and on the project homepage.

How did you come across es-hadoop ? It's an honest question since it looks like the reference documentation was not advertised enough and I'd like to address that.


(system) #5