Scoring In Search Result


(Hanish Bansal) #1

We are building search functionality on our platform using ElasticSearch. Currently our use case is not to handle spelling mistake cases so we are not using Fuzzy search. We are using match query to return most relevant results.

In match query, score is being calculated by most relevancy based on term frequency and all. We also want to include recency factor in result.

What we are looking for?

  • Relevant results but most recent result should be on top. In other words, most recent with high relevancy results should be on top.

To implement this, we are trying to use "Function Score" with Gauss function but we are not able to get desired result. After few documents, score of all other docs becomes zero , due to that result is not properly sorted.

"gauss": {
       "pub_date": {
         "origin": "now",
         "scale": "2d",
         "decay": 0.3
       }
     }

What should be optimal value of scale and decay factor to get the desired result (most recent with high relevancy results should be on top)?


(Ivan Brusic) #2

Can you post your query, or at least your function score? Perhaps your
boost_mode or score_mode is set incorrectly.

I found that the gaussian decay has a huge performance impact. I simplified
things by boosting the values based on tiers and not based on a gaussian
scale. For instance, in the following example, documents published in the
last week have their scores double. Anything in the past two weeks gets
doubled again (essentially anything in the last week is 4x).

Not only is it much faster, but I find it easier to thing about boosting
within a range. If document A is slightly more relevant than document B,
but is a bit older, the gaussian decay might flip the order. With tiers, if
they were both in the same general time frame (last day), order is
preserved.

"function_score": {
"boost_mode": "multiply",
"functions": [
{
"filter": {
"range": {
"pub_date": {
"gte": "now-1w/d",
"lte": "now/d"
}
}
},
"weight": 2
},
{
"filter": {
"range": {
"pub_date": {
"gte": "now-2w/d",
"lte": "now/d"
}
}
},
"weight": 2
}
],
"query": {
...
},
"score_mode": "multiply"
}

If you still want to use the gaussian approach, post your query.

Cheers,

Ivan


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.