IDs Query Performance Problems

Hello,

I have documents with the same ids (guid) split across indices in my system. When a search is run across indices, I join them in my client code. This is done using IDs queries: IDs | Elasticsearch Guide [8.13] | Elastic

The indices are millions of documents at times, sometimes over 50gb a pop, but there are 5 shards per index and multiple nodes and 31gb of ram dedicated to the jvms.

The ID queries are sometimes hundred of thousands of guid ids.

Using the query profiler in Kibana, I found the build_scorer in TermInSetQuery was 99-100% of the bottleneck in each search request. I don't need scoring, and I saw here: Sort search results | Elasticsearch Guide [8.13] | Elastic that scoring can be disabled by adding a sort, and then I saw some people achieve this by sorting on the _doc, but adding this doesn't change performance or what the profiler tells me. The build_scorer is still the bottleneck.

I saw TermInSetQuery seems to be a Lucene thing, so perhaps sorting on the request level in Elasticsearch doesn't affect what's going on in Lucene?

I tried Constant score query | Elasticsearch Guide [8.13] | Elastic and filters in general as well for the IDs query, no luck. Still computes the score in the profiler breakdown.

1 Like

Scoring shouldn't be too expensive to run, but you've listed the options available at search time. For example:

PUT score_test/_doc/abc
{ 
  "field": "foo"
}
PUT score_test/_doc/def
{
  "field": "bar"
}
PUT score_test/_doc/ghi
{
  "field": "baz"
}

GET score_test/_search
{
  "query": {
    "ids": {
      "values": ["abc", "def" ]
    }
  }
}


GET score_test/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "ids": {
            "values": [
              "abc",
              "def"
            ]
          }
        }
      ]
    }
  }
}

You could also consider using the get API for each single doc ID. That will get the document and will not return search results.

As I could have 100k document ids, I think executing 100k GET requests would be worse than waiting the 10 seconds~ currently. What I don't understand is why is it saying it's computing the score when I'm telling it not to.

Are you requesting 100K IDs in the request? I suspect the score is a red herring and it's simply the number of IDs you're requesting at once that's causing your performance bottleneck.

I am indeed requesting 100k ids in the IDs query. I would hope this wouldn't be a problem, as that's only 3.6mbs of data. I use a IDs query for the documents in the other index so I can apply sorting and get source fields from the main index.

If it's a red herring, then that's very unfortunate, because I don't know how to fix the problem.

If it's a bug with how the Lucene method is being used, I was thinking about forking Elasticsearch, investigating, and making a PR to fix that or at least opening an issue on the Elasticsearch github.