Scoring tagged documents with custom scores

Hey folks,

We're interested in customizing ElasticSearch relevance scoring by using our own relevance values for each tag that was added to a document. We're still early in our exploration process but would appreciate some guidance on how to best achieve this with ElasticSearch (and whether it is possible).

Given this document:

{
  "id": "10252",
  "popularity_score": "1.28",
  "tags": [
    {
      "tag": "beach",
      "relevance": "0.7"
    },
    {
      "tag": "illustration",
      "relevance": "0.3"
    },
    {
      "tag": "california",
      "relevance": "0.9"
    },
  ]
}

and given the following query:

"query" : "beach illustration"

We would like the score for this document to be:

"score" = mean(relevance) * popularity_score
        = mean([0.7, 0.3]) * 1.28
        = 0.64

From our research, this sounds possible through nested queries and script score. However, the following case complicates things slightly:

For the same document, and the following query:

"query": "house illustration"

We would like the score to be:

"score" = mean([0, 0.3]) * 1.28

(since the document does not have the tag "house")

We wouldn't want to store all possible tags (>100k) on each document, and it sounds like the sparse vector data type would have been useful here -- however it appears to have been deprecated in 7.6.

Any guidance here (or even just if this is indeed possible to do) would be greatly appreciated. Thanks!

1 Like

There is a new datatype called rank_features that may help is your use-case. Queries on it are very efficient. However there is no possibility of custom scoring on features, there are only 3 available predefined functions on rank_features.

If you model your tags as rank_features, you can run the following query:

{
  "query": {
    "script_score": {
      "query": {
        "bool": {
          "should": [
            {
              "rank_feature": {
                "field": "tags.beach"
              }
            },
            {
              "rank_feature": {
                "field": "tags.illustration"
              }
            }
          ]
        }
      },
      "script": {
        "source": "_score * doc['popularity_score'].value"
      }
    }
  }
}

Note that, _score here will be calculated by default using saturation function.

We also currently discussing a possibility of using linear function for rank_features.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.