_mget vs _search for large amount of documents

hattorihanzo · July 28, 2023, 9:33am

Hi there! I'm looking for some guidance around what's the most fit way to retrieve a large amount of documents (>=1000) when you know their ID. I'd like it to be fast, yet efficient and not put unnecessary strain on ES since this lookup routine will run very often.

The mget documentation doesn't give any pointers as to how it's implemented or how how it's different than using a _search with a terms query. Are there usecases fit to tackle one thing over the other? It's not clear.

Purely on speed - both approaches seem to be fast and working fine with a large amount of documents, but reading through this thread made me think that mget does N parallel individual "get" operations which seems unnecessary and inefficient.

For retrieving N (large amount of) documents I'd imagine a batched approach would be much more fit (instead of running N parallel gets, you run J batches of retrieval, J being a much smaller number than N.) With that in mind, I'm guessing a _search terms query does this "batched" approach and is more lean but what do I know! This is pure speculation so I'd like to ask for some guidance around what each thing does.

Christian_Dahlqvist · July 28, 2023, 10:13am

I would expect the mget approach to be as fast or faster as terms queries with large number of terms can be slow.

hattorihanzo · July 28, 2023, 12:12pm

Great - thanks for the input. Do you know if the stress/load of the two is in the same ballpark?

I'm asking because sometimes speed is not synonymous with efficiency. For example, you can have a process that gains speed by opening 1000 concurrent threads and another process that's slower, but it uses a more conservative e.g. 5 threads. In this scenario I'd consider the first process to be more "stressful" to a database. Not sure if that thinking applies here.

dadoonet · July 28, 2023, 12:25pm

I have asked internally about this question because I honestly don't know.
My guts feeling is that if the number of items to retrieve is low (let say 100), than mget might be better.
But if you compare a 10000 mget vs a 10000 search on a single shard, I'd expect search to be more efficient because of the Lucene optimizations behind the scene...

But I'm waiting for an answer about this very good question

hattorihanzo · July 28, 2023, 12:40pm

Amazing, thank you! If this ends up being fruitful we can potentially add something to the docs

system · August 25, 2023, 12:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bulk get (_mget) performance when using ES as key value store Elasticsearch	3	1719	February 27, 2019
Mget too slow for large amount of documents Elasticsearch	9	1803	February 16, 2022
Long search time with mget Elasticsearch	3	512	August 2, 2021
Multiget (mget) API performance Elasticsearch	5	1937	December 21, 2021
How to speed up msearch queries? Elasticsearch	6	530	March 25, 2020

_mget vs _search for large amount of documents

Related topics