Multiget (mget) API performance

jon_cabrera · November 22, 2021, 2:15pm

Hello!

I am working on a search engine and I have introduced the use of the mget API to obtain a series of documents from a series of ids.

My question is related to a series of issues that I have been able to observe in the performance of this type of search.

The average time of the requests that use mget is higher than the rest of the requests that use other operations of the Elasticsearch API. A sample of these times and their operations could be this:

_mget API -> **30-40ms**
_search API -> 20-30ms
_count API -> 16-18ms
get API (for an specified document) -> 5-6ms

Is this behavior normal? Before introducing the use of mget, I thought that the performance would be at least similar to that of the search API. Sometimes the difference between both endpoints is greater than 10ms.

In many of these cases, information is being obtained from approximately 1 to 20 documents.

On the other hand we see that in addition to these average times there are also times that seem to be quite high. A small percentage (~1%) seem to have times between 200ms-2000ms.
Is there a way to lower the times of these requests?

To analyze the mget response times we had to monitor the response times of the http layer of the microservice developed to expose a rest interface that in turn makes use of the mget API through a Java client. Is there any utility to be able to make a direct profile of this type of request? If I'm not mistaken, the profile API is only available for _search.

My index has 14-16M of documents and i'm using 7.10 version. We need to retrieve the source of each document and we don't request stored fields. This is an example of the query that I'm performing:

POST http://localhost:9200/_mget
Content-Type: application/json

{
  "docs": [
    {
      "_index": "<my_index>",
      "_type": null,
      "_id": "<some_id>",
      "routing": null,
      "stored_fields": null,
      "version": -3,
      "version_type": "internal",
      "_source": null
    }
  ]
}

Thanks!

warkolm · November 23, 2021, 12:22am

Yep, because you're basically doing N gets, each of which take time to run and then be collated. And the overall response is only going to be as fast as the slowest one.

At a guess, that could be a cache miss. I am not sure you can do much about those tbh.

Correct;

Any _search request can be profiled by adding a top-level profile parameter:

jon_cabrera · November 23, 2021, 11:23am

Thanks for your answer. Is there an alternative to mget to be able to obtain documents by id that can improve performance in this case? Any way to improve cache behavior?

Christian_Dahlqvist · November 23, 2021, 11:53am

Are all the different types of queries returning the same number of raw documents?

jon_cabrera · November 23, 2021, 2:02pm

When looking with _search I normally use from / size parameters to return 30 documents. With _mget i'm requesting from 1 to 20 documents approximately.

system · December 21, 2021, 2:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Mget too slow for large amount of documents Elasticsearch	9	1803	February 16, 2022
Bulk get (_mget) performance when using ES as key value store Elasticsearch	3	1719	February 27, 2019
Java mget is much slower than http mget when in high concurrency environment Elasticsearch	1	417	July 5, 2017
_mget vs _search for large amount of documents Elasticsearch	5	1181	August 25, 2023
Long search time with mget Elasticsearch	3	512	August 2, 2021

Multiget (mget) API performance

Related topics