Document overriding using update api

Hi All,

I have a scenario as below,

POST _bulk
{ "index" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
{"field1": "ram", "doj" : "2019-12-01"}
{ "index" : { "_index" : "test", "_type" : "_doc", "_id" : "2" } }
{"field1": "ram", "doj" : "2019-03"}

Above I am indexing 2 documents with two fields.

POST _bulk
{ "update" : { "_index" : "test", "_type" : "_doc", "_id" : "1" } }
{"doc":{"doj" : "2019-12-01"}}
{ "update" : { "_index" : "test", "_type" : "_doc", "_id" : "2" } }
{"doc":{"doj" : "2019-04"}}

Above I am trying to update the indexed documents.

  1. Here I am trying to update only single field.
  2. I am sure the other field will remain as it is.
  3. But if In case I want to remove other field and keep only updating field, then what can I do.

could someone please help me on this.

Thanks,
Ram Prasad G

It's better to do an index operation instead the update operation. Is there any specific reason you want to use the update API?

Coming back to your question, I think that the only is by running a script which removes the field. But that will be slower and more complicated IMO than providing the full document.

Hi David,

Firstly thank you very much for your quick reply and suggestion.
Q. Is there any specific reason you want to use the update API?
A. Yes, in my post, i have given sample document, but in our application our document contains 4 objects. Among these 4, one object will get updated by some other source to the existing documents.
So, next time when an update comes for the first 3 objects, we need to only update those three objects.
In this scenario, when an update comes with only 2 objects, we need to update document with only 2 objects(here I meant like, we have to remove the 3rd object, which behaves like overriding.)

Any inputs on this scenario will be appreciated and I will be thankful to you.

Thanks,
Ram Prasad G

Any inputs on this scenario will be appreciated and I will be thankful to you.

I don't like using the Update API. I'm only using it when I have to modify a single field in a very big document, like some megabytes of JSON as it's a way to save some network bandwidth. Other than that, I don't use it.

When you are saying inner objects, what does it look like? Why can't you send again the full 4 inner objects?

Hi David,

Earlier we were using Index only, as it will override complete document everytime. And we don't need override every time due to loading of 4th object from other source.
Please find the below sample document.

{
	"stdFinans": {
		"identity": {
			"busName": {
				"name": "TRANS",
				"languageCode": 40
			},
			"businessAddress": {
				"streetAddress": {
					"line1": "57 IMPA"
				},
				"town": "HESDINS"
			},
			"startDate": "2008-06-01",
			"dateOfRegistration": "2008-06-01",
			"numberOfEmployees": [{
				"value": 1,
				"date": "2018-10-05"
			}],
			"status": "Active"
		},
		"financs": {
			"numberOfStatements": 5,
			"allFinancials": [{
				"statementToDate": "2016-12-31",
				"statementFromDate": "2016-01-01"
			}, {
				"statementToDate": "2016-12-31",
				"statementFromDate": "2016-01-01"
			}, {
				"statementToDate": "2016-12-31",
				"statementFromDate": "2016-01-01"
			}]
		},
		"control": {
			"isStoped": false,
			"isDelimited": false
		},
		"corpLimit": {
			"corpName": "abc",
			"corpAddr": "def"
		}
	}
}

Note: Due to security reason, I am sending a sample document with similar structure of our document.
In the above json 4th object (corpLimit) will be updated from other source into our ES.
Our process has to update if document present as we have maintain the 4th object as is, and it will index if document is not present in ES.
In update scenario, we have to handle like, if a document came with an update, we have to store only those fields, fields which are not present in document should be removed from ES document without disturbing 4th object.

Please let me know what will be the best solution to fullfill our requirement.

Thanks,
Ram Prasad G

I don't know the "best" solution. It depends.
My experience in the past was that I was never updating a sub internal object in my application but the root object with may be only modifications in the inner object.

If you think of a SQL model, behind the scene the update was made only in the inner object but from a user or functional point of view, the change was made globally.
In which case, I was sending again the whole global object to Elasticsearch.

In update scenario, we have to handle like, if a document came with an update, we have to store only those fields, fields which are not present in document should be removed from ES document without disturbing 4th object.

That is making the job you have to do on your end more complicated IMO.
What I'd probably do is to read the document from Elasticsearch (GET by ID), then merge it with whatever you want and send it back as a whole.
That's basically what the update API is doing behind the scene. But that way you can better control complex scenario where you have to rewrite a whole object.

But if you find a better way, that's great. Feel free to share your solution here.

1 Like

Hi David,

Thank you very much for your inputs.
Finally, we came to a conclusion as below.

  1. everytime we get a document(new/updated), we check for 4th object present r not
  2. 4th object present will append it to the existing document
  3. Will perform Index operation everytime(as it will override always)

Please let me know your thoughts.

Thanks,
Ram Prasad G

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.