Bulk insert to elasticsearch in spark using scala

rulanitee · February 27, 2017, 5:29am

Hello there,

Is there a way to save the following Json in spark using scala:

{"index":{"_id":1662034,"_type":"fhl","_index":"loan"}}
{"name":"John","surname":"Doe","dob":"1985-01-03","title":"Mr", per_id:1662034}
{"index":{"_id":1662035,"_type":"fhl","_index":"loan"}}
{"name":"Foo","surname":"Fooo","dob":"1980-02-14","title":"Mrs", per_id:1662035}

If this is not possible, kindly asking on how i can do bulk inserts to Elasticsearch in spark using scala?

Regards

Muthu_Jayakumar · February 27, 2017, 3:18pm

Hello Rulanitee,

Is there a way to save the following Json in spark using scala

Apache Spark has DataFrame/Dataset APIs to save JSON. RDD based APIs may exist as well.

how i can do bulk inserts to Elasticsearch in spark using scala?

I believe when you read a JSON or any datasource into Spark, the write operation into Elasticsearch may be like bulk insert (to some extent). As every executor with-in the spark cluster is going to run the saveToEs action in parallel.
If you would like to use non-spark APIs to perform a bulk insert, I have used akka-http for this. Not sure if this may be an overkill for you, as akka-streams may have a bit of a learning curve.

Hope this helps.

Thanks,
Muthu

rulanitee · February 27, 2017, 5:18pm

Hi,

Thanks for the tip. Will research on the akka-http.

Regards

rulanitee · February 28, 2017, 9:23am

Hi Muthu

Apparently the answer was right in front of me. From the documentation as shown below

val json1 = """{"reason" : "business", "airport" : "SFO"}"""
val json2 = """{"participants" : 5, "airport" : "OTP"}"""

new SparkContext(conf).makeRDD(Seq(json1, json2))
.saveJsonToEs("spark/json-trips")

Just add json string to an array and push to ES.

So if one has a class or object, using gson or any other json framework, cast the object to json string and add it to an array or list.

Hope this helps someone ...

system · March 28, 2017, 9:23am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Spark sql insert json dataframe into elastic search Elasticsearch	1	1723	August 16, 2017
Is it possible to perform bulk insert from Spark to ElasticSearch? Elasticsearch es-hadoop	4	6556	July 6, 2017
Save json file to elasticsearch using spark? Elasticsearch es-hadoop	3	2051	August 26, 2018
Bulk documents into Elasticsearch with pyspark Elasticsearch	1	741	November 23, 2017
How to read/write to Elasticsearch with Apache Spark with scala Elasticsearch	3	747	November 28, 2018

Bulk insert to elasticsearch in spark using scala

Related topics