Can es-hadoop write bulk files to disk?

jspooner · August 9, 2016, 8:38pm

I have a great performing elasticsearch cluster running that can consume 200k new documents per second but I my spark job takes takes hours to prepare the data. Is it possible to have es-hadoop save it's bulk files to disk? Then I can write a script to just curl post each of bulk files.

james.baiera · August 12, 2016, 4:45pm

I'm afraid this is not currently supported with the connector. Is there any reason that you would want to perform the bulk operations yourself? If the slow part of the spark job is the data preparation, then why not let spark perform the bulk operations as they become available?

That being said, the bulk format is actually pretty simple to create. You could always just try running the data through a JSON serializer on the spark side, prepend the bulk header, and write the final output to a text file thereafter?

Topic		Replies	Views
Is it possible to perform bulk insert from Spark to ElasticSearch? Elasticsearch es-hadoop	4	6556	July 6, 2017
Staging Data for Elasticsearch bulk loading Elasticsearch es-hadoop	2	1740	July 6, 2017
Load data into HDFS using ES-Spark Elasticsearch es-hadoop	2	1999	July 6, 2017
Load data into HDFS using ES-Spark Elasticsearch	2	579	July 6, 2017
Ingesting data from HDFS to ElasticSearch Elasticsearch	3	3772	February 15, 2017

Can es-hadoop write bulk files to disk?

Related topics