How to increase relevancy for duplicate documents?

jmiller · June 27, 2018, 8:18pm

TF-IDF is causing unwanted behavior for my query. I have a set of documents in the following format:

{
    "name": "brown fox",
    "description": "a sentence-long description"
}

Many documents have the same name with different descriptions. If I search something like to brown fox, I want to receive all documents with name brown fox because it is an exact match (or close to one).

Instead, the top hit is:

{
     "yellow dog",
     "the dog is not brown"
}

This document is the only one with brown in its description so the TF-IDF score for that match is high. Meanwhile both brown and fox match the other document, but the TF-IDF score is low because of the duplicates.

Any tips on how to increase the score of the brown fox documents?

Mapping: both fields are type text use the standard analyzer.

Query:

  dis_max:
      tie_breaker: 0.7
      queries:
          - match:
            name: "{{search_string}}"
          - match:
            description: "{{search_string}}"

I don't want to disable tf-idf on name because it helps in other search cases. Is it possible to either group together name and description for TF-IDF calculation so that a brown in description is not weighed higher than in name? Or is it possible to stop the duplicate document names from increasing docFreq?

Thank you for any help! Let me know if I left anything out.

system · July 25, 2018, 8:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
A question around to get relevant content By using TF-IDF algorithm Elasticsearch	1	242	November 9, 2021
How to disable TF/IDF completely Elasticsearch	7	4719	April 10, 2018
Scoring based on existence of all terms even if one term appears multiple times Elasticsearch	2	408	July 5, 2017
Compare relevance for different document types Elasticsearch	1	434	July 5, 2017
Search over most frequent matches / terms without TF or IDF adjustment Elasticsearch	1	554	July 5, 2017

How to increase relevancy for duplicate documents?

Related topics