_mget vs _search for large amount of documents

Hi there! I'm looking for some guidance around what's the most fit way to retrieve a large amount of documents (>=1000) when you know their ID. I'd like it to be fast, yet efficient and not put unnecessary strain on ES since this lookup routine will run very often.

The mget documentation doesn't give any pointers as to how it's implemented or how how it's different than using a _search with a terms query. Are there usecases fit to tackle one thing over the other? It's not clear.

Purely on speed - both approaches seem to be fast and working fine with a large amount of documents, but reading through this thread made me think that mget does N parallel individual "get" operations which seems unnecessary and inefficient.

For retrieving N (large amount of) documents I'd imagine a batched approach would be much more fit (instead of running N parallel gets, you run J batches of retrieval, J being a much smaller number than N.) With that in mind, I'm guessing a _search terms query does this "batched" approach and is more lean but what do I know! This is pure speculation so I'd like to ask for some guidance around what each thing does.

I would expect the mget approach to be as fast or faster as terms queries with large number of terms can be slow.

Great - thanks for the input. Do you know if the stress/load of the two is in the same ballpark?

I'm asking because sometimes speed is not synonymous with efficiency. For example, you can have a process that gains speed by opening 1000 concurrent threads and another process that's slower, but it uses a more conservative e.g. 5 threads. In this scenario I'd consider the first process to be more "stressful" to a database. Not sure if that thinking applies here.

I have asked internally about this question because I honestly don't know.
My guts feeling is that if the number of items to retrieve is low (let say 100), than mget might be better.
But if you compare a 10000 mget vs a 10000 search on a single shard, I'd expect search to be more efficient because of the Lucene optimizations behind the scene...

But I'm waiting for an answer about this very good question :wink:

1 Like

Amazing, thank you! :pray: If this ends up being fruitful we can potentially add something to the docs

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.