Partially matched shorter field takes precedence over completely matched larger field

aniketkk · November 21, 2017, 7:25am

Hi, I have two fields in my index on which I am querying. One is the title and another is content.
Content is a big field and contains entire content of an article.
Both fields have the same analyzer with min_gram as 3 and max_gram as 20.

Now the problem is when I search for a term and if the term completely matches the content field and partially matches the title field, title field takes precedence.

For example, I was searching for a term called hacking.
Now there is a document which contains hacking in its content field. And there are multiple docs whose title contains tracking (which is not hacking by any means).

But when I do a query with hacking, all the tracking results come on top and hacking is in somewhere third or fourth page of results. This is not what I expect. Hacking result should come on top. When I check the score the one with tracking as title gets 0.6 as score and one with hacking in the content gets 0.08 as the score, even though I have queried with Hacking.

My query looks like this

{
  "query": {
    "multi_match" : {
      "query": "hacking", 
      "fields": ["title", "content"] 
    }
  }
}

I understand that shorter field always will be scored more. But that is not what is expected.

Please help how to fix this.

dadoonet · November 21, 2017, 7:51am

You can boost the title field

“title^5”

For example

aniketkk · November 21, 2017, 10:08am

@dadoonet You mean the content field because content is a larger field.

Anyway even after I boost the content field, the results are not satisfactory though it is better than before.
Can't we make exact match get the first precedence no matter in which field it is present and then partial matches can come in the search results?

Mikhail_Khludnev · November 21, 2017, 10:25am

You need to have a field without ngrams (see copy fields or multifields), include it search with high boost. It should resolve the problem.
However, I suppose using ngramms is obviously bad idea.

aniketkk · November 21, 2017, 11:50am

Why do you think using ngrams is a bad idea? Otherwise how can we do partial searching?

Mikhail_Khludnev · November 21, 2017, 12:00pm

Why do you think using ngrams is a bad idea?

They are darn expensive. Spoils results with overwhelming recall that you exactly fighting with.

Otherwise how can we do partial searching?

you probably need to provide suggester/auto-complete/spellchecking functionality. Returning everything always (with ngramms) is not usually preferred.

Nevertheless. It's not a point. Have you solved your relevancy problem?

aniketkk · November 22, 2017, 1:15pm

Thanks @Mikhail_Khludnev. I am reconsidering the ngram and the configurations for it.

And I solved by using the following answer in stackoverflow that I asked.

system · December 20, 2017, 1:15pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Partial match with ngram. How to avoid\minimize excessive search result Elasticsearch	2	1594	May 8, 2018
Wrong fields are comimg Elasticsearch	5	294	July 15, 2021
NGram Partial Match & limiting nGram results in multiple field query Elasticsearch	2	1194	July 5, 2017
NGRAM Tokens and query_string question Elasticsearch	3	734	May 4, 2017
nGram filter and relevance score Elasticsearch	3	3638	July 6, 2017

Partially matched shorter field takes precedence over completely matched larger field

Related topics