Performance of update_by_query operation

Let's say I have an index with 100 Million docs and I want to update all of them using update_by_query.

I could create single update by query(UBQ) that affect all the docs in the index or I could create 10 UBQ tasks one after another, each affecting 10 Million docs.

UBQ internally is a scan and scroll operation, so I am wondering if the duration of the task has any impact on the performance of UBQ/load on the cluster. The longer the UBQ operation, cluster will have to keep a snapshot for that long. I assume this will have some overhead. So, I am wondering if there will be a performance benefit in splitting the UBQ task where by limiting the number of docs each UBQ task affects.


Rather than execute multiple UBQ, I'd use the slicing support that it provides, which effectively creates multiple parallel slices equivalent to multiple UBQ... you just don't have to micromanage them.

Also see this note about choosing the number of slices.

Basically, you're intuition is correct: choosing more slices will increase the speed at which docs are updated, reduce the time search contexts are held open and generally improve performance. The auto parameter will choose one slice per shard (up to a limit) as that's the theoretical best setting.

Note that it will have a noticeably larger impact on the cluster in terms of indexing cost... more docs being updated per second will demand more IO, may affect your normal indexing, etc. Just something to keep in mind.

Hi @polyfractal

Thanks for that suggestion, but I want be a bit on the safe side when it comes to the load on the cluster. So If I am not parrallelizing the UBQ using slices, I wanted to know if there would be any performance difference between running one long UBQ vs 10 short UBQs sequentially.

Considering that the long contexts would mean lot of stale segments lying around which could consume extra system resources(memory, cpu overheads). So, I am curious to know if there is any significance performance difference between the two options I am exploring.
I think the argument applies even if I were using slices. It's 1 long UBQ (with/Without slices) vs 10 small UBQs(with/without slices).

Ah sorry, didnt catch that you meant them to run sequentially.

I'm not sure to be honest, we're into theoretical territory now :slight_smile: it seems reasonable that if you can tolerate the management of sequential UBQ, it'd probably be more "friendly" to the cluster. Similar to sliced scrolls, you'll have search contexts open for shorter periods of time.

I'm not sure how much it would impact resources though. A few lingering open search contexts are ok, as long as you don't have hundreds/thousands (people get into trouble by opening hundreds of scrolls without closing old ones).

The main overhead you'll see is increased index size if you're actively indexing, since the old segments won't merge out right away. There will be some minor memory overhead but fairly negligible.

So it'll probably consumer fewer resources, but I can't quantify how much, or if it's worth the effort. I don't think it could hurt though, so feel free to give it a shot if you want :slight_smile:

Thanks @polyfractal

After monitoring for sometime, I have been feeling that long UBQs are slow though I couldn't quantify.

I tried to validate above theory with a test data (2 Mil docs) recently, I saw 10% improvement in overall time if I split it into 100k per batch. Though these observations could just be wrong and coincidental.

I am going to try that anyway on actual data as splitting has other beneficial side effects for us.

Thanks again for your opinion.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.