Hi I need to run approximately 30,000 updateByQuery's against my index which has about 35 million decently large documents, sequentially. I'm currently setting
refresh:true, and the eta for my script to complete all 30,000 queries is about 2 weeks from now.... So i want to speed things up.
The update painless script updates a field on _source that does not need to be queried against, but each update step will need the latest copy of _source when if it pulls in the same document as a previous query.
Basically my situation is like so:
- query index
- for each document returned, if _source.nonQueriedField doesn't already have tag x, add tag x to _source.nonQueriedField
- repeat for each query / tag combination
Since my queries don't need the latest copy of _source at query time, only at update time, do I still need to set
refresh: true? Or put another way, does an updated document ever have multiple copies of _source floating around in the index, between refreshes?