Hi Team,
I have a csv file where i need to store this data in index ("carsdata") by partition column.
Input CSV (cars-data.csv):
carname,enginetype,cost,countryId,modelnumber
BMW1,Petrol,10000,1,12
BMW2,Petrol,20000,1,12
BMW3,Petrol,30000,1,12
BMW4,Petrol,40000,1,12
BMW5,Petrol,50000,18,13
BMW6,Petrol,60000,18,13
BMW7,Petrol,70000,18,13
BMW8,Petrol,80000,18,13
BMW9,Petrol,90002,18,13
Code:
object ElasticSearchWriteLocal {
def main(args: Array[String]) {
val sparkSession = sparkSession.builder().appName("WriteToElasticSearch").master("local").getOrCreate()
val dataFrame = sparkSession.read.option("header", "true").csv("cars-data.csv")
dataFrame.write
.format("org.elasticsearch.spark.sql")
.option("es.port", "9200")
.option("es.nodes", "localhost")
.partitionBy("countryId", "modelnumber")
.mode("overwrite")
.save("carsdata/doc")
}
}
This code is executing without any problem and i am able to see the data in the index called "carsdata".
Now my requirement is to overwrite specific data by partitionBy columns as shown below.
Existing data in the index (carsdata) for countryId=18 & modelnumber=13:
carname,enginetype,cost,countryId,modelnumber
BMW5,Petrol,50000,18,13
BMW6,Petrol,60000,18,13
BMW7,Petrol,70000,18,13
BMW8,Petrol,80000,18,13
BMW9,Petrol,90002,18,13
Assume New Data given for countryId=18 & modelnumber=13 as given below:
carname,enginetype,cost,countryId,modelnumber
BMW5,Petrol,60023,18,13
BMW6,Diesel,68444,18,13
BMW7,Petrol,84755,18,13
BMW8,Diesel,80000,18,13
BMW9,Diesel,483448,18,13
Now i wanted to overwrite only countryId=18 & modelnumber=13 data with new data as shown above and without Overwriting entire index data.
Could you please help me how can we achieve this without Overwriting entire index data.
And does Elastic-Search is really Supporting for Partition by column while writing? If Yes, Could you please help me on this.
If No, then may i know how code is executing without any problem as i have mentioned .partitionBy("countryId", "modelnumber")?
Please help me on this.