Multi_match with phrase_prefix is not working although a token has the prefix in it

ansun · July 24, 2024, 4:04pm

Hello,

I am facing a strange situation with multi_match.
This is the simplified multi_match query I am trying

{
    "multi_match": {
        "query": "202201",
        "type": "phrase_prefix",
        "fields": [
          "file_name"
        ]
      }
}

If I query with 202201 then I see 20220101_Legal Document_5678.pdf from the result. but if I query with 2022 I don't see the file any longer.

So I analyzed the file_name 20220101_Legal Document_5678.pdf and here's the tokens

{
  "tokens": [
    {
      "token": "20220101_legal",
      "start_offset": 0,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "document_5678",
      "start_offset": 15,
      "end_offset": 28,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "pdf",
      "start_offset": 29,
      "end_offset": 32,
      "type": "<ALPHANUM>",
      "position": 2
    }
  ]
}

as you see the first token starts with 2022 so it should be matched to the query with 2022. shouldn't it?
But the result of the query with 2022 doesn't include that file. Can someone explain the reason?

Thank you for your help in advance!
Best,
Ansun

Kathleen_DeRusso · July 24, 2024, 7:44pm

What version of Elasticsearch are you using?

In 8.14.1, I can't reproduce this, because the following snippet:

POST test/_doc
{
  "file_name": "20220101_Legal Document_5678.pdf"
}

POST test/_search
{
  "query": {
    "multi_match": {
      "query": "2022",
      "type": "phrase_prefix", 
      "fields": ["file_name"]
    }
  }
}

will return the expected result. Even 20 and 2 returns the expected result.

It would be helpful to provide more information - perhaps the mappings, more specific query, and version to start, and whether you still return that just further down the result set that you expect.

Musab_Dogan · July 24, 2024, 7:53pm

It's because of the max_expansions value. By default it's set to 50. So when you send query with longer characters (eg. "202201") you will see the results. You can increase the max_expansion value the but it can hurt the performance.

See my screenshot, both queries are showing the results because I have only one doc in test index.

See the notes from official documentation: Match phrase prefix query | Elasticsearch Guide [8.14] | Elastic

As a workaround you can choose one of the followings.

Increase the max_expansion value in your query.
Use edge_ngram tokenizer - This will also tune the query speed. (recommended)
Use prefix query - there is no max_expansion limit for the prefix query but it can be slower than phrase_prefix.

ansun · July 25, 2024, 7:38am

Hi Kathleen,

Thank you for your reply!
I am using ES 7.13.2.

The mapping for file_name is

{
  "file_name": {
    "type": "text",
    "fields": {
      "keyword": {
        "type": "keyword",
        "ignore_above": 256
      }
    }
  }
}

ansun · July 25, 2024, 7:43am

Hi Musab,

Thank you for your answer! Increasing max_expansions to 100 resolved the problem. But do you have an idea why query with 2022 doesn't show the file 20220101_Legal Document_5678.pdf? From what I understood, the filename has the exact match with the prefix 2022, so it doesn't even need to expand the query.

Or Does ES expand the query to compare to the whole token (20220101_legal ) ?

ansun · July 25, 2024, 11:34am

Just want to answer to my question above. Match phrase prefix query | Elasticsearch Guide [8.14] | Elastic already explains how it works. so ES expands the phrase in the query with the suggestion it fuzzies. so it means we need 20220101_Legal to be generated with the expansion.

system · August 22, 2024, 11:34am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Match phrase prefix in set of documents Elasticsearch	5	635	January 10, 2019
Prefix with multi_match Elasticsearch	2	1225	November 9, 2017
Multi_match phrase_prefix is giving different results for Elastic versions 6.6 and 8.4 Elasticsearch	2	225	November 26, 2022
Multi_match prefix query working weird Elasticsearch	3	623	February 26, 2019
Wildcard and phrase_prefix in _all field by multi_match query Elasticsearch	1	1739	July 5, 2017

Multi_match with phrase_prefix is not working although a token has the prefix in it

Related topics