ORing a text field with a unique identifier keyword field leading to increased next_doc count and poor performance

TL;DR: I want to find documents that match a certain query (e.g. "my search") OR match the unique ID of a document.

When I search for the text field alone, the search is super fast. However, when I or the text field with the keyword field search becomes extremely slow. Looking at the profile data, it seems like the shard that contains the document ends up doing a lot more next_doc than the other shards. Why would that be the case?

Below are the mappings and queries I executed for reference.

Mapping:

{
  "mappings": {
    "properties": {
      "searchable_name_1":    { "type": "text" },  
      "searchable_name_2":  { "type": "text"  }, 
      "unique_identifier":   { "type": "keyword"  }     
    }
  }
}

The following query executes super fast:

{
  "profile": true,
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "query": "my search",
            "fields": [
              "searchable_name_1",
              "searchable_name_2"
            ]
          }
        }
      ]
    }
  }
}

However, when I or the unique identifier with the text query, it becomes super slow:

{
  "profile": true,
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "query": "my search",
            "fields": [
              "searchable_name_1",
              "searchable_name_2"
            ]
          }
        },
        {
          "term": {
            "unique_identifier": {
              "value": "13590585",
              "boost": 10
            }
          }
        }
      ]
    }
  }
}


Screenshot 2023-03-22 at 11.12.02 AM

Hi
Sanity check - the mapped field is called “unique_identifier” and the query is “unique_id” in your example

Thanks for your reply. It was a copy-paste mistake while anonymizing the use case. Fixed in the original post.

1 Like

Thanks. How many docs have this unique ID of 13590585?

Just one. That's why I'm confused why ORing the two queries would result in such latency since the posting list for that ID has only one doc.

Hmm. Then I expect that the addition of a clause is causing a change to scoring that means an optimisation is not kicking in.

If you set “track_total_hits” to true in both queries then this might help compare speeds with the shortcut optimisation disabled.
(It won’t solve your problem but would help with the diagnostics here)

I enabled track_total_hits on both queries.

Fast one (just match query): 5323035
Slow one (match query OR keyword term): 5323036

I presume that’s the number of hits. How do the response times compare?

That's correct.

Shard response time for match only:

Shard response time for match OR keyword term:

By the way, thanks in advance for your help, Mark!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.