How to prioritize more recent content?


(Anca Mosoiu) #1

Hello there,

I'm using ElasticSearch to search through a WordPress database. I'd like to create a query that returns results based on score, where the score is improved when a document is found with a more recent date, along with whether the text appears in the title or description.

Here's a query that searches through several fields, and returns results ordered by score:

GET server.local/_search
{
"size":24,
"from":0,
"query":{
 "multi_match":{
   "query":"bagel",
   "fields":["post_title^3","post_excerpt^2","post_content","post_author.display_name","terms.category.name", "terms.resource-category.name","terms.post_tag.name"],
   "type":"cross_fields",
   "operator":"or"}
  
},
    "_source": ["post_id", "post_title", "post_author.display_name", "post_date.date"]
}

I tried sorting by date by adding this clause:

"sort":[{"post_date.date":"desc"}]

But this removed the scores from the results altogether.

I noticed that i can sort by score and then by date, but the scores come back with far too much precision for this to be useful. (e.g. a score is 15.125123 and the next one is 15.031, but the document is more recent).

Is there a way to either (a) round down the score so that I can then sort all the 15's by date or (b) add a clause that subtracts from the score when a document is older than a certain date?

Thank you in advance for any pointers.


(Ivan Brusic) #2

I believe the truncating scores will lead you down some inconsistent paths.

You can wrap your query in a function score query and do some boosting based on timestamps: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

The blog has an example that works in your context: https://www.elastic.co/blog/found-function-scoring

I found that the decay functions have a big performance impact. I prefer to use pre-determined ranges for boost levels., Makes the mental model easier as well. For example: if within the past week, boost by 4. For the two weeks after that, boost by 2. Ranges are dependent on your use case.


(Anca Mosoiu) #3

Thank you Ivan! I didn't know what to search for, so that helped a lot. I wound up adding scoring functions that multiply the score by 5 if it's within the past year, and by 2 if it's within the last 4.

  "function_score" : {
      "query":{
          "multi_match":{
              "query":"bagel",
               "fields": ["post_title^3","post_excerpt^2","post_content","post_author.display_name","terms.category.name","terms.resource-category.name","terms.post_tag.name"],
              "type":"cross_fields",
              "operator":"or"}
   },
"functions": [
     {"filter": { 
         "range" : { "post_date.date": { "gte" : "now-1y", "lte" : "now" } }} ,
         "weight": 5
     },
     {"filter": { 
        "range" : { "post_date.date": { "gte" : "now-4y", "lt" : "now-1y" } }} ,
        "weight": 2
      }
   ],
  "boost_mode": "multiply"

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.