Update documents

(P) #1

I have a generic question on updating existing documents. From almost a year of using Elasticsearch, I have come to believe that updating an elasticseach document is not the same as updating a record in an RDBMS. We are building a data centric solution that indexes documents based on "events" and "transactions". The events have already occurred by the time they are indexed into ES and hence will never be "updated". On the other hand we are also using ES to store defects. Defects change state(because of transactions). They go from "open" to "submitted" and then to "fixed" and "closed". If we were to index these defects data into ES, how do we handle state changes? Do we update the same defects record to show the latest state or do we create a new defect document every time we pull this data from the source?
I want to know what will be the right approach here with respect to ES. We have been obviously updating the record in our old RDBMS based system.
This important design consideration also means that our ES queries could be simple or complex. For example, if we indexed a new document every time when we pull from the source, I am having a hard time using Kibana to report on aggregated metrics.

(Mark Walkom) #2

There's two patterns here.

The first is to store the changes in state to the defect as events/transactions and then display the latest to the user. This way you get a history of these changes you can go back on and analyse.
The other is to just keep one document per defect and then update that when things change. Which means you always have the latest state, but no history (unless you expand that one defect to include it).

(P) #3

Thanks for the quick answer. If we were to choose the second approach where we update the same document every time the state changes, do we have to be aware any potential differences between an RDBMS update vs ES update. If there is any documentation regarding this that you would recommend, please share. I am specifically interested in concurrency, consistency, latency related information in update operations.

(Mark Walkom) #4

https://www.elastic.co/guide/en/elasticsearch/guide/2.x/partial-updates.html is a little old, but the core concepts still apply.

(P) #5

Thank you. Will read

(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.