Multiget (mget) API performance

Hello!

I am working on a search engine and I have introduced the use of the mget API to obtain a series of documents from a series of ids.

My question is related to a series of issues that I have been able to observe in the performance of this type of search.

The average time of the requests that use mget is higher than the rest of the requests that use other operations of the Elasticsearch API. A sample of these times and their operations could be this:

  • _mget API -> **30-40ms**
  • _search API -> 20-30ms
  • _count API -> 16-18ms
  • get API (for an specified document) -> 5-6ms

Is this behavior normal? Before introducing the use of mget, I thought that the performance would be at least similar to that of the search API. Sometimes the difference between both endpoints is greater than 10ms.

In many of these cases, information is being obtained from approximately 1 to 20 documents.

On the other hand we see that in addition to these average times there are also times that seem to be quite high. A small percentage (~1%) seem to have times between 200ms-2000ms.
Is there a way to lower the times of these requests?

To analyze the mget response times we had to monitor the response times of the http layer of the microservice developed to expose a rest interface that in turn makes use of the mget API through a Java client. Is there any utility to be able to make a direct profile of this type of request? If I'm not mistaken, the profile API is only available for _search.

My index has 14-16M of documents and i'm using 7.10 version. We need to retrieve the source of each document and we don't request stored fields. This is an example of the query that I'm performing:

POST http://localhost:9200/_mget
Content-Type: application/json

{
  "docs": [
    {
      "_index": "<my_index>",
      "_type": null,
      "_id": "<some_id>",
      "routing": null,
      "stored_fields": null,
      "version": -3,
      "version_type": "internal",
      "_source": null
    }
  ]
}

Thanks!

Yep, because you're basically doing N gets, each of which take time to run and then be collated. And the overall response is only going to be as fast as the slowest one.

At a guess, that could be a cache miss. I am not sure you can do much about those tbh.

Correct;

Any _search request can be profiled by adding a top-level profile parameter:

Thanks for your answer. Is there an alternative to mget to be able to obtain documents by id that can improve performance in this case? Any way to improve cache behavior?

Are all the different types of queries returning the same number of raw documents?

When looking with _search I normally use from / size parameters to return 30 documents. With _mget i'm requesting from 1 to 20 documents approximately.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.