Update by Query on big amount of data

I am using Elasticsearch 5.0.1 and NEST 5.0 on C#.

I need to implement an Update By Query script that will potentially update millions of documents.

For now I am using an script to update all the documents. This is all done by just calling the update by query API once.

My main quesiton here is what will happen if thousents of documents fail to update, for example because of version problems. In this case I will get a response from Elasticsearch with thousents of ids on the failure collection what it can be hard to manage in code (and also I guess that this collection will have a limit eventually)

So I am wondering if there is a better way of doing this on Elasticsearch.

I am glad you asked, I was planning on doing something similar and didn't realize that that would be an issue.
I think one way to go about it would be to create a uuid for each update and add it to all of the successfully updated docs. So that if you get failed updates, you can exclude any docs that have the uuid upon retries and keep retrying until no more failures.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.