Can es-hadoop write bulk files to disk?


(Jonathan Spooner) #1

I have a great performing elasticsearch cluster running that can consume 200k new documents per second but I my spark job takes takes hours to prepare the data. Is it possible to have es-hadoop save it's bulk files to disk? Then I can write a script to just curl post each of bulk files.


(James Baiera) #2

I'm afraid this is not currently supported with the connector. Is there any reason that you would want to perform the bulk operations yourself? If the slow part of the spark job is the data preparation, then why not let spark perform the bulk operations as they become available?

That being said, the bulk format is actually pretty simple to create. You could always just try running the data through a JSON serializer on the spark side, prepend the bulk header, and write the final output to a text file thereafter?


(system) #3