Okay, as you said assume i have created a carid (where carid <car_name>__modelnumber) locate single document uniquely.
I have read the Elastic Search API where we have 2-options to update data.
- Update API (to update single record by id (carid))
- Update by query, but it wont feasible if we are updating with different data for each document.
I observed that Update API will supports only single document at a time to update. But i would want to update the 1-million records through spark then how can we update is my question, but Update API will supports only single document at a time.
I have explored one more way as well, where i can delete some of the data (using delete by query) as per our requirement and append with new data. But delete by query is not performing as expected and it is not deleting all the data if we have millions of records.
Here is the sample code that i have executed in Kibana.
POST /carsdata/_delete_by_query
{
"query": {
"bool": {
"must": [
{
"match": {
"carname": "BMW4"
}
},
{
"match": {
"enginetype": "Petrol"
}
}
]
}
}
}
Could you please help me out how to delete some data (might be millions) based on the condition without any failures. and how to integrate this with Spark Elastic Search API if delete is working fine without any problem.