Millions of documents -> add vs. update [aspect of performance]

Dom_Sie · April 19, 2016, 8:03am

Hi,

when a ES Index stores millions [lots of millions] documents [Es Version 1.4.4; 5 primary shards with 1 replica shard per index] and you have many document/data updates in short periods of time:
Would you make an update per document or did you use an "revision management" and insert a new document with each change? (Considering the performance aspect)

thn · April 19, 2016, 12:53pm

Behind the scene, update means delete the existing one then add a new one so add in theory should be better, especially when you are dealing with millions of documents.

Adding only leaves you with "duplicated documents" meaning there will be at least two or more documents in the index that have very similar contents (or minor differences) The definition about a dup is varied based on the data domain and business needs so don't take it personal when someone says "your definition of a dup is wrong"

For example, if a document is about a person and his/her address. Version 1 has one address, version 2 has a different address. Ask yourself, what does your business want to do with this? If it only prefers the most up to date address, then you need to do an update, not an add. If it wants to keep a history about one's addresses, then you need to do an add.

Dom_Sie · April 20, 2016, 12:52pm

Thx for your answer. I am want only use update operations - on the one hand because of the duplicate data problem - but i need arguments for this [e.g. performance is not significantly worse even if there is a high system utilization || high number of querys]. Preferably with statistics or benchmarks to prove it...
lg

thn · April 20, 2016, 1:01pm

Now you know what is going to happen when doing an UPDATE and as you said, you want an UPDATE, I suggest you gather the metrics based on your data and share the results here.

Topic		Replies	Views
How about the upgrade performance? Elasticsearch	2	316	July 6, 2017
Insert vs. Update Elasticsearch	3	12164	January 15, 2017
Multiple Indices vs Single Index with Alias Elasticsearch	5	1536	May 23, 2019
Create vs Update Elasticsearch	4	4267	May 24, 2017
Update or delete document is better? Elastic Search	2	33	August 29, 2024

Millions of documents -> add vs. update [aspect of performance]

Related topics