Scoring tagged documents with custom scores

Charles_Lariviere · May 5, 2020, 9:21pm

Hey folks,

We're interested in customizing ElasticSearch relevance scoring by using our own relevance values for each tag that was added to a document. We're still early in our exploration process but would appreciate some guidance on how to best achieve this with ElasticSearch (and whether it is possible).

Given this document:

{
  "id": "10252",
  "popularity_score": "1.28",
  "tags": [
    {
      "tag": "beach",
      "relevance": "0.7"
    },
    {
      "tag": "illustration",
      "relevance": "0.3"
    },
    {
      "tag": "california",
      "relevance": "0.9"
    },
  ]
}

and given the following query:

"query" : "beach illustration"

We would like the score for this document to be:

"score" = mean(relevance) * popularity_score
        = mean([0.7, 0.3]) * 1.28
        = 0.64

From our research, this sounds possible through nested queries and script score. However, the following case complicates things slightly:

For the same document, and the following query:

"query": "house illustration"

We would like the score to be:

"score" = mean([0, 0.3]) * 1.28

(since the document does not have the tag "house")

We wouldn't want to store all possible tags (>100k) on each document, and it sounds like the sparse vector data type would have been useful here -- however it appears to have been deprecated in 7.6.

Any guidance here (or even just if this is indeed possible to do) would be greatly appreciated. Thanks!

mayya · May 7, 2020, 9:03pm

There is a new datatype called rank_features that may help is your use-case. Queries on it are very efficient. However there is no possibility of custom scoring on features, there are only 3 available predefined functions on rank_features.

If you model your tags as rank_features, you can run the following query:

{
  "query": {
    "script_score": {
      "query": {
        "bool": {
          "should": [
            {
              "rank_feature": {
                "field": "tags.beach"
              }
            },
            {
              "rank_feature": {
                "field": "tags.illustration"
              }
            }
          ]
        }
      },
      "script": {
        "source": "_score * doc['popularity_score'].value"
      }
    }
  }
}

Note that, _score here will be calculated by default using saturation function.

We also currently discussing a possibility of using linear function for rank_features.

system · June 4, 2020, 9:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best option for scoring documents based on custom relevancy score Elasticsearch	3	441	July 6, 2017
Handling tag "weights" Elasticsearch	4	1453	July 6, 2017
Help with query design -- scoring Elasticsearch	2	304	July 6, 2017
Custom Score Query and non-numeric field values Elasticsearch	5	1402	July 6, 2017
Basic Query regarding custom_score Elasticsearch	8	391	July 6, 2017

Scoring tagged documents with custom scores

Related topics