i am reading the data from an elasticsearch index, processing it using spark and finally loading it into a new elasticsearch index.
while processing the data i am grouping the data using date(day wise) and driverPkid columns present in my data.
finally when loading the data into elasticsearch if i use the option ("" -> "driverPkid)" only the latest record from my data gets loaded into elasticsearch and the older ones are ignored. when i don't use the option duplicate records are getting created for the same driverPkid. how to solve this issue.please help.

I don't know spark, but it sounds like that assigns the driverPkid value from your data to the Elasticsearch _id, is that right?
And does that value remains the same across different records?

If so what you are seeing is valid, as Elasticsearch will be updating the records based on the ID.

yes!! it assigns the driverPkid value from your data to the Elasticsearch _id. but the problem is i am grouping the data day wise, so only the latest(today's) record for driverPkid gets loaded in to elasticsearch and the older records (yesterday and day before) for the driverPkid gets ignored.

Does that value stay the same over each day?

no the value differs. i am calculating the total no of finished orders by each driver per day.

Can you provide some examples of what the documents that you are sending to Elasticsearch look like?

The actual json would be handy.

"_index": "driver-order-details-test",
"_type": "_doc",
"_id": "daF_XnkBsBfFsV53dHC2",
"_version": 1,
"_score": null,
"_source": {
"driverPkid": 200528,
"driverType": "FREELANCER",
"totalFIN": 1,
"totalCAN": 1,
"totalREJ": 0,
"the_date": "2021-05-11"
"fields": {
"the_date": [
"sort": [

Ok that looks like an autogenerated ID, and it's not putting driverPkid into there as it has it's own field.

In that case, I am not sure what is happening, I don't know spark so may not be much help sorry.

thank you for your time and help @warkolm :raised_hands: :blush:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.