Bulk insert to elasticsearch in spark using scala

Hello there,

Is there a way to save the following Json in spark using scala:

{"index":{"_id":1662034,"_type":"fhl","_index":"loan"}}
{"name":"John","surname":"Doe","dob":"1985-01-03","title":"Mr", per_id:1662034}
{"index":{"_id":1662035,"_type":"fhl","_index":"loan"}}
{"name":"Foo","surname":"Fooo","dob":"1980-02-14","title":"Mrs", per_id:1662035}

If this is not possible, kindly asking on how i can do bulk inserts to Elasticsearch in spark using scala?

Regards

Hello Rulanitee,

Is there a way to save the following Json in spark using scala

Apache Spark has DataFrame/Dataset APIs to save JSON. RDD based APIs may exist as well.

how i can do bulk inserts to Elasticsearch in spark using scala?

I believe when you read a JSON or any datasource into Spark, the write operation into Elasticsearch may be like bulk insert (to some extent). As every executor with-in the spark cluster is going to run the saveToEs action in parallel.
If you would like to use non-spark APIs to perform a bulk insert, I have used akka-http for this. Not sure if this may be an overkill for you, as akka-streams may have a bit of a learning curve.

Hope this helps.

Thanks,
Muthu

Hi,

Thanks for the tip. Will research on the akka-http.

Regards

Hi Muthu

Apparently the answer was right in front of me. From the documentation as shown below

val json1 = """{"reason" : "business", "airport" : "SFO"}"""
val json2 = """{"participants" : 5, "airport" : "OTP"}"""

new SparkContext(conf).makeRDD(Seq(json1, json2))
.saveJsonToEs("spark/json-trips")

Just add json string to an array and push to ES.

So if one has a class or object, using gson or any other json framework, cast the object to json string and add it to an array or list.

Hope this helps someone ...

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.