I'm trying to implement name search using match, match_phrase and match_prefix. My data is messy and spread across name fields, first_name, last_name, etc.
For the sake of simplicity, I've been trying to narrow down the problem and it seems that one letter names (i.e. initials) are causing me headaches. For example consider these three names:
"Jilles van Gurp", "Ali G", "G." indexed into the name.value field.
I've simplified everything to the point where I'm using default everything on es 1.7.3 (analyzer, etc.) and the following query
GET /tst/_search
{
"query": {
"match": {
"name.value": {
"query": "jilles g"
}
}
}
}
A query like "jilles" will work as expected. However, as soon as I add the letter g "jilles g", it all goes sideways. and "G." ends up on top. It seems it considers this a full token match on G. and that makes that the most important result despite also having a full token match on jilles. However, from my point of view it is actually the weakest result because it does not actually match most of the query. Phrase prefix match does not produce any results here. What would be a good query + analyzer strategy to make this work as expected that does not introduce too many false positives and actually prefers the "jilles van gurp" over "g" for the query "jilles g"?
For reference, I'm indexing contact data and this data is messy. When asked their name, people sometimes just fill in a single letter. So, I have to deal with it one way or another.