How can I use match_phrase correctly?

Hi,I got an issue,when I post a query like this

POST /scene_dev/_search
{
  "query": {
              "match_phrase": {
            "labels": "test-qzw"
          }    
  }
} 

and got the response like this

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 2,
      "relation": "eq"
    },
    "max_score": 30.4391,
    "hits": [
      {
        "_index": "scene_dev",
        "_id": "5GG9eI0Bke7j5TSbBlKj",
        "_score": 30.4391,
        "_source": {
          "labels": "test qzw"
        }
      },
      {
        "_index": "scene_dev",
        "_id": "42G8eI0Bke7j5TSbR1LQ",
        "_score": 23.916782,
        "_source": {
          "labels": "qzw-test test qzw"
        }
      }
    ]
  }
}

but if I just modify my query to this

POST /scene_dev/_search
{
  "query": {
              "match_phrase": {
            "labels": "qzw-test"
          }
    
  }
}

I'll get just the second json above,why?I I want to search the docs contains exact"qzw-test",how should I build my query?

I'm guessing you're using the default standard analyzer, which strips the hyphen and understands your string as two separate tokens. You can test this using the analyze api:

GET /_analyze
{
  "analyzer": "standard",
  "text": "test-qzw"
}

returns:

{
  "tokens": [
    {
      "token": "test",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "qzw",
      "start_offset": 5,
      "end_offset": 8,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

If you want to do a text search interpreting hyphens in a different way, you'll need to use a different analyzer. Note that since these have been indexed with the standard analyzer already using a whitespace analyzer or something at search time won't work the way you want.

Here's more information on how to specify an analyzer.

Thanks for reply, so the tokenizer will still apply to my search content in math_phrase,but what I need is to make a search to hit "test-qzw" when I search for "test", so I still need the tokenizer to use hyphen to tokenize my string, but I alse need a way to hit only stings containing "test-qzw" but not "test qzw"
(thanks to my god damn app scenario). Is there any possible to make this realized by Elasticsearch?

I think you could do this with a custom tokenizer and analyzer with some fiddling. But if you don't want to go down that route the only other option would be to use keyword fields which it doesn't sound like you want.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.