Scoring for words and fetch docs that get minimum score

Dekel · June 23, 2015, 10:37pm

I'm looking for a way to give words a score (different score for each word) and find docs that match a minimum score (based on the scores and the words).

The idea is that I might have words with high score (wordA - score=5) and I want all docs that has wordA, but I also have words with lower score (wordB, wordC, wordD - score=2). If i want docs that match minimum of score=4 I would like to get the docs that has wordA or any combination of 2 of (wordB, wordC, wordD).
Same goes with minimum score of 6 - the result should be docs contains wordA and one of (wordB, wordC, wordD), or docs contains all 3 of (wordB, wordC, wordD).

I was trying to boost words combined with minimum_should_match, but it doesn't really do the trick. Any ideas of how I can do that?

polyfractal · June 24, 2015, 6:38pm

If you need the explicit scoring (e.g you don't want TF-IDF derived scores), you can use the function_score to set your own custom scoring based on lists of terms and their weightings. For example:


POST test/test
{
  "title": "wordA wordB wordC"
}

POST test/test
{
  "title": "wordD"
}

POST test/test
{
  "title": "wordB wordC"
}

POST test/test
{
  "title": "wordC"
} 

GET /test/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "filter": {
            "terms": {
              "title": ["worda"]
            }
          },
          "weight": 5
        },
        {
          "filter": {
            "terms": {
              "title": ["wordd"]
            }
          },
          "weight": 3
        },
        {
          "filter": {
            "terms": {
              "title": [ "wordb"]
            }
          },
          "weight": 1
        },
        {
          "filter": {
            "terms": {
              "title": [ "wordc"]
            }
          },
          "weight": 2
        }
      ],
      "score_mode": "sum"
    }
  },
  "min_score": 3
}

Each filter function contains a terms which includes the list of tokens for a given weight. The function score is then configured to sum up the weights. Then we configure the query to have a minimum score of 3, which excludes documents that haven't accumulated enough "weight".

You could also use the Terms Lookup functionality to index those term lists, instead of specifying them in the query itself

Topic		Replies	Views
Scoring per term match Elasticsearch	1	559	July 5, 2017
Give more score to documents that contains all query terms Elasticsearch	1	369	June 17, 2018
How to get result that contains every word in the query Elasticsearch	3	1664	October 21, 2019
Min_score in Elasticsearch Elasticsearch	4	1972	May 17, 2019
How can we alter the score field for specific documents depending on a particular condition Elasticsearch	2	661	July 5, 2017

Scoring for words and fetch docs that get minimum score

Related topics