Problems with match_phrase

Hi
I have a query that looks like this:

{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "fullAddress": "To-Bjerg 2"
          }
        }
      ]
    }
  },
  "from": 0,
  "size": 25
}

I want it to match both To-Bjerg 2 and To-Bjerg 22, but I seem to be unable to do that.
What am I doing wrong ?

My index mapping looks like this:

{
    "addresses": {
        "mappings": {
            "properties": {
                "addressId": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "city": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "country": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "door": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "etrs89CoordinatEast": {
                    "type": "double"
                },
                "etrs89CoordinatNorth": {
                    "type": "double"
                },
                "floor": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "fullAddress": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "houseNumber": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "postalCode": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "postalCodeAndCity": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "status": {
                    "type": "integer"
                },
                "street": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "streetAddress": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "streetAndPostal": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "updated": {
                    "type": "date"
                }
            }
        }
    }
}

What am I doing wrong ?

I only get an excact match back.

match_phrase query matches the text contains the same array of analyzed tokens but not characters. The behavior depends on the analyzer you use.

If "To-Bjerg 2" is analyzed to ["to", "bjerg", "2"] and
"To-Bjerg 22" is analyzed to ["to", "bjerg", "22"],
the query phrase "To-Bjerg 2", which is analyzed to ["to", "bjerg", "2"], may only match the former.

You can check the behavior of the analyzer via Analyze API.

Hmm... it sounds like Match phrase prefix query | Elasticsearch Guide [8.2] | Elastic ha?

Almost it returns To-Bjerg 2 and To-Bjerg 20 but not To-Bjerg 21.
Any idea why ?

I am very unsure what analyzer to use.
It is a bit of a jungle to me.
Any pointers ?

Thanks for the help.
I managed to use your pointers to create a custom ngram analyzer and use that with a match query.

"analysis": {

                    "analyzer": {

                        "my_analyzer": {

                            "type": "custom",

                            "tokenizer": "my_tokenizer"

                        }

                    },

                    "tokenizer": {

                        "my_tokenizer": {

                            "type": "ngram",

                            "min_gram": "2",

                            "max_gram": "50"

                        }

                    }

                }

{

    "query": {

        "match": {

            "fullAddress": {

                "query": "To-Bjerg 2"

            }

        }

    }

}
1 Like

I ended up with these setting for my index:

"analysis": {
                    "analyzer": {
                        "my_edge_ngram_analyzer": {
                            "filter": [
                                "lowercase"
                            ],
                            "type": "custom",
                            "tokenizer": "edge_ngram_tokenizer"
                        },
                        "my_ngram_analyzer": {
                            "type": "custom",
                            "tokenizer": "ngram_tokenizer"
                        }
                    },
                    "tokenizer": {
                        "edge_ngram_tokenizer": {
                            "token_chars": [
                                "letter",
                                "digit"
                            ],
                            "custom_token_chars": "-",
                            "min_gram": "1",
                            "type": "edge_ngram",
                            "max_gram": "50"
                        },
                        "ngram_tokenizer": {
                            "type": "ngram",
                            "min_gram": "1",
                            "max_gram": "50"
                        }
                    }
                }

And I used my_edge_ngram_analyzer on the address field.

1 Like