ES 1.3 - Calculating+updating a field in millions of documents

jdyck · December 2, 2015, 3:16am

Hey all,

I have an ES index containing several million records. I'm adding a field to the document's mapping, afterwards I'll have to go back and calculate the value for this new field for every existing document. The value can be different for every document, so I don't think the 'update by query' idea pertains here since I'll have to calculate the value elsewhere.

Is there an efficient way of doing this?

Is the best way to do this to just pull the IDs of all documents in batches of 1000 or so, then for each batch calculate the value and update the documents using the bulk API?

Thanks!

dadoonet · December 2, 2015, 6:10am

If you are updating 100% off docs, it's better to reindex.

jdyck · December 2, 2015, 6:44am

Would you suggest something like this to do it with minimal downtime?

Create new index and PUT the revised mapping to the new index
Use the scan and scroll method to read all of the existing documents and calculate the value for the new field
Send the new documents to the new index in batches using the bulk api
Set an alias for the new index to 'flip the switch' to use the new index

In this case, would the disk usage double if I have the data in both indexes?

dadoonet · December 2, 2015, 7:27am

Exact for all points.

dadoonet · December 2, 2015, 7:28am

Just missing the DELETE old index at the end

Topic		Replies	Views
Editing and re-indexing large amounts of data in elasticsearch (millions of records) Elasticsearch	2	461	May 12, 2022
Most efficient way to bulk update index? Elasticsearch	4	472	June 14, 2018
Index Elasticsearch	5	338	July 6, 2017
Updating only a few fields out of many Elasticsearch	4	391	November 21, 2023
What is the best practise to add a new field to an index in elasticsearch? Elasticsearch	5	220	November 6, 2023

ES 1.3 - Calculating+updating a field in millions of documents

Related topics