Hey folks,
We're interested in customizing ElasticSearch relevance scoring by using our own relevance values for each tag that was added to a document. We're still early in our exploration process but would appreciate some guidance on how to best achieve this with ElasticSearch (and whether it is possible).
Given this document:
{
"id": "10252",
"popularity_score": "1.28",
"tags": [
{
"tag": "beach",
"relevance": "0.7"
},
{
"tag": "illustration",
"relevance": "0.3"
},
{
"tag": "california",
"relevance": "0.9"
},
]
}
and given the following query:
"query" : "beach illustration"
We would like the score for this document to be:
"score" = mean(relevance) * popularity_score
= mean([0.7, 0.3]) * 1.28
= 0.64
From our research, this sounds possible through nested queries and script score. However, the following case complicates things slightly:
For the same document, and the following query:
"query": "house illustration"
We would like the score to be:
"score" = mean([0, 0.3]) * 1.28
(since the document does not have the tag "house")
We wouldn't want to store all possible tags (>100k) on each document, and it sounds like the sparse vector data type would have been useful here -- however it appears to have been deprecated in 7.6.
Any guidance here (or even just if this is indeed possible to do) would be greatly appreciated. Thanks!