It's better to do an index operation instead the update operation. Is there any specific reason you want to use the update API?
Coming back to your question, I think that the only is by running a script which removes the field. But that will be slower and more complicated IMO than providing the full document.
Firstly thank you very much for your quick reply and suggestion.
Q. Is there any specific reason you want to use the update API?
A. Yes, in my post, i have given sample document, but in our application our document contains 4 objects. Among these 4, one object will get updated by some other source to the existing documents.
So, next time when an update comes for the first 3 objects, we need to only update those three objects.
In this scenario, when an update comes with only 2 objects, we need to update document with only 2 objects(here I meant like, we have to remove the 3rd object, which behaves like overriding.)
Any inputs on this scenario will be appreciated and I will be thankful to you.
Any inputs on this scenario will be appreciated and I will be thankful to you.
I don't like using the Update API. I'm only using it when I have to modify a single field in a very big document, like some megabytes of JSON as it's a way to save some network bandwidth. Other than that, I don't use it.
When you are saying inner objects, what does it look like? Why can't you send again the full 4 inner objects?
Earlier we were using Index only, as it will override complete document everytime. And we don't need override every time due to loading of 4th object from other source.
Please find the below sample document.
Note: Due to security reason, I am sending a sample document with similar structure of our document.
In the above json 4th object (corpLimit) will be updated from other source into our ES.
Our process has to update if document present as we have maintain the 4th object as is, and it will index if document is not present in ES.
In update scenario, we have to handle like, if a document came with an update, we have to store only those fields, fields which are not present in document should be removed from ES document without disturbing 4th object.
Please let me know what will be the best solution to fullfill our requirement.
I don't know the "best" solution. It depends.
My experience in the past was that I was never updating a sub internal object in my application but the root object with may be only modifications in the inner object.
If you think of a SQL model, behind the scene the update was made only in the inner object but from a user or functional point of view, the change was made globally.
In which case, I was sending again the whole global object to Elasticsearch.
In update scenario, we have to handle like, if a document came with an update, we have to store only those fields, fields which are not present in document should be removed from ES document without disturbing 4th object.
That is making the job you have to do on your end more complicated IMO.
What I'd probably do is to read the document from Elasticsearch (GET by ID), then merge it with whatever you want and send it back as a whole.
That's basically what the update API is doing behind the scene. But that way you can better control complex scenario where you have to rewrite a whole object.
But if you find a better way, that's great. Feel free to share your solution here.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.