Hello there,
Is there a way to save the following Json in spark using scala:
{"index":{"_id":1662034,"_type":"fhl","_index":"loan"}}
{"name":"John","surname":"Doe","dob":"1985-01-03","title":"Mr", per_id:1662034}
{"index":{"_id":1662035,"_type":"fhl","_index":"loan"}}
{"name":"Foo","surname":"Fooo","dob":"1980-02-14","title":"Mrs", per_id:1662035}
If this is not possible, kindly asking on how i can do bulk inserts to Elasticsearch in spark using scala?
Regards
Hello Rulanitee,
Is there a way to save the following Json in spark using scala
Apache Spark has DataFrame/Dataset APIs to save JSON. RDD based APIs may exist as well.
how i can do bulk inserts to Elasticsearch in spark using scala?
I believe when you read a JSON or any datasource into Spark, the write operation into Elasticsearch may be like bulk insert (to some extent). As every executor with-in the spark cluster is going to run the saveToEs action in parallel.
If you would like to use non-spark APIs to perform a bulk insert, I have used akka-http for this. Not sure if this may be an overkill for you, as akka-streams may have a bit of a learning curve.
Hope this helps.
Thanks,
Muthu
Hi,
Thanks for the tip. Will research on the akka-http.
Regards
Hi Muthu
Apparently the answer was right in front of me. From the documentation as shown below
val json1 = """{"reason" : "business", "airport" : "SFO"}"""
val json2 = """{"participants" : 5, "airport" : "OTP"}"""
new SparkContext(conf).makeRDD(Seq(json1, json2))
.saveJsonToEs("spark/json-trips")
Just add json string to an array and push to ES.
So if one has a class or object, using gson or any other json framework, cast the object to json string and add it to an array or list.
Hope this helps someone ...