ES 1.3 - Calculating+updating a field in millions of documents


(jdyck) #1

Hey all,

I have an ES index containing several million records. I'm adding a field to the document's mapping, afterwards I'll have to go back and calculate the value for this new field for every existing document. The value can be different for every document, so I don't think the 'update by query' idea pertains here since I'll have to calculate the value elsewhere.

Is there an efficient way of doing this?

Is the best way to do this to just pull the IDs of all documents in batches of 1000 or so, then for each batch calculate the value and update the documents using the bulk API?

Thanks!


(David Pilato) #2

If you are updating 100% off docs, it's better to reindex.


(jdyck) #3

Would you suggest something like this to do it with minimal downtime?

  1. Create new index and PUT the revised mapping to the new index
  2. Use the scan and scroll method to read all of the existing documents and calculate the value for the new field
  3. Send the new documents to the new index in batches using the bulk api
  4. Set an alias for the new index to 'flip the switch' to use the new index

In this case, would the disk usage double if I have the data in both indexes?


(David Pilato) #4

Exact for all points.


(David Pilato) #5

Just missing the DELETE old index at the end :slight_smile:


(system) #6