How can I use match_phrase correctly?

Aukid · February 5, 2024, 11:21am

Hi,I got an issue,when I post a query like this

POST /scene_dev/_search
{
  "query": {
              "match_phrase": {
            "labels": "test-qzw"
          }    
  }
}

and got the response like this

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 30.4391,
    "hits": [
      {
        "_index": "scene_dev",
        "_id": "5GG9eI0Bke7j5TSbBlKj",
        "_score": 30.4391,
        "_source": {
          "labels": "test qzw"
        }
      },
      {
        "_index": "scene_dev",
        "_id": "42G8eI0Bke7j5TSbR1LQ",
        "_score": 23.916782,
        "_source": {
          "labels": "qzw-test test qzw"
        }
      }
    ]
  }
}

but if I just modify my query to this

POST /scene_dev/_search
{
  "query": {
              "match_phrase": {
            "labels": "qzw-test"
          }
    
  }
}

I'll get just the second json above,why？I I want to search the docs contains exact"qzw-test",how should I build my query?

Kathleen_DeRusso · February 5, 2024, 2:44pm

I'm guessing you're using the default standard analyzer, which strips the hyphen and understands your string as two separate tokens. You can test this using the analyze api:

GET /_analyze
{
  "analyzer": "standard",
  "text": "test-qzw"
}

returns:

{
  "tokens": [
    {
      "token": "test",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "qzw",
      "start_offset": 5,
      "end_offset": 8,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

If you want to do a text search interpreting hyphens in a different way, you'll need to use a different analyzer. Note that since these have been indexed with the standard analyzer already using a whitespace analyzer or something at search time won't work the way you want.

Here's more information on how to specify an analyzer.

Aukid · February 6, 2024, 2:05am

Thanks for reply, so the tokenizer will still apply to my search content in math_phrase,but what I need is to make a search to hit "test-qzw" when I search for "test", so I still need the tokenizer to use hyphen to tokenize my string, but I alse need a way to hit only stings containing "test-qzw" but not "test qzw"
(thanks to my god damn app scenario). Is there any possible to make this realized by Elasticsearch?

Kathleen_DeRusso · February 6, 2024, 1:23pm

I think you could do this with a custom tokenizer and analyzer with some fiddling. But if you don't want to go down that route the only other option would be to use keyword fields which it doesn't sound like you want.

system · March 5, 2024, 1:23pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch with React.js: match_phrase Elasticsearch	2	299	April 6, 2022
How does the match_phrase work for a field with different search_analyzer/index_analyzer? Elasticsearch	1	381	July 6, 2017
Match with type phrase_prefix doesn't work when words mix numbers and letters Elasticsearch	4	350	April 25, 2023
Match_phrase not matching all terms Elasticsearch	6	3888	January 25, 2019
Elasticsearch query like not working when search number and string in 1 words using wildcardquery and match_phrase Elasticsearch	1	587	September 20, 2019

How can I use match_phrase correctly?

Related topics