Match_phrase not matching all terms

abdon · December 28, 2018, 9:50am

You are using the edge_ngram token filter. Let's see how your analyzer treats your query string "ho". Assuming your index is called my_index:

GET my_index/_analyze
{
  "text": "ho",
  "analyzer": "autocomplete"
}

The response shows you that the output of your analyzer would be two tokens at position 0:

{
  "tokens": [
    {
      "token": "h",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "ho",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    }
  ]
}

What does Elasticsearch do with a query for two tokens at the same position? It treat's the query as an "OR", even if you use a type "phrase". You can see that from the output of the validate API (which shows you the Lucene query that your query was written into):

GET my_index/_validate/query?rewrite=true
{
  "query": {
    "match": {
      "name": {
        "query": "ho",
        "type": "phrase"
      }
    }
  }
}

Because both your query and your document have an h at position 0, the document is going to be a hit.

Now, how to solve this? Instead of the edge_ngram token filter, you could use the edge_ngram tokenizer. This tokenizer increments the position of every token it outputs.

So, if you create your index like this instead:

PUT my_index
{
  "settings": {
    "number_of_shards": 1,
    "analysis": {
      "tokenizer": {
        "autocomplete_tokenizer": {
          "type": "edge_ngram",
          "min_gram": 1,
          "max_gram": 20
        }
      },
      "analyzer": {
        "autocomplete": {
          "type": "custom",
          "tokenizer": "autocomplete_tokenizer",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "name": {
          "type": "string",
          "analyzer": "autocomplete"
        }
      }
    }
  }
}

You will see that this query is no longer a hit:

GET my_index/_search
{
  "query": {
    "match": {
      "name": {
        "query": "ho",
        "type": "phrase"
      }
    }
  }
}

But for example this one is:

GET my_index/_search
{
  "query": {
    "match": {
      "name": {
        "query": "he",
        "type": "phrase"
      }
    }
  }
}

Topic		Replies	Views
Edge ngram with phrase matching Elasticsearch	7	4946	July 5, 2017
Phrase matching using query_string on nGram analyzed data Elasticsearch	4	1613	July 6, 2017
Search by digits doesn't work with edge_ngram Elasticsearch	3	1606	July 3, 2018
Match phrase and minimum_should_match combination Elasticsearch	1	876	July 6, 2017
Unexpected match when using edgeNgram filter Elasticsearch	1	266	April 20, 2021

Match_phrase not matching all terms

Related topics