Single letter term matches cause noise in search results

jillesvangurp · March 29, 2016, 2:40pm

I'm trying to implement name search using match, match_phrase and match_prefix. My data is messy and spread across name fields, first_name, last_name, etc.

For the sake of simplicity, I've been trying to narrow down the problem and it seems that one letter names (i.e. initials) are causing me headaches. For example consider these three names:
"Jilles van Gurp", "Ali G", "G." indexed into the name.value field.

I've simplified everything to the point where I'm using default everything on es 1.7.3 (analyzer, etc.) and the following query

GET /tst/_search
{
  "query": {
    "match": {
      "name.value": {
        "query": "jilles g"
      }
    }
  }
}

A query like "jilles" will work as expected. However, as soon as I add the letter g "jilles g", it all goes sideways. and "G." ends up on top. It seems it considers this a full token match on G. and that makes that the most important result despite also having a full token match on jilles. However, from my point of view it is actually the weakest result because it does not actually match most of the query. Phrase prefix match does not produce any results here. What would be a good query + analyzer strategy to make this work as expected that does not introduce too many false positives and actually prefers the "jilles van gurp" over "g" for the query "jilles g"?

For reference, I'm indexing contact data and this data is messy. When asked their name, people sometimes just fill in a single letter. So, I have to deal with it one way or another.

xavierfacq · March 29, 2016, 3:10pm

I can try to add :

"operator" : "and"

Or you can try also a multi_match query with :

"minimum_should_match": "80%"

jillesvangurp · March 29, 2016, 3:37pm

Thanks but this does not work with prefix queries. In the end I figured out I need to set "max_expansions": 10000 (default 10). The problem was that I have so many tokens starting with g that it never expanded to gurp because of the low default max_expansions. So it sorted itself out as soon as I bumped this to 10k. Yes this makes it slower but at least it is more correct.

Topic		Replies	Views
Phrase_prefix search issue with single character Elasticsearch	5	970	December 19, 2019
Prefix query search words rather than sentence Elasticsearch	7	881	July 6, 2017
"phrase_prefix" not working for some prefixes Elasticsearch	2	952	July 6, 2017
Match_phrase_prefix acts erratically when there is a filter Elasticsearch	5	1620	July 5, 2017
Increase score for single word matches (plural / singular) versus multiple words matches? Elasticsearch	1	404	March 25, 2022

Single letter term matches cause noise in search results

Related topics