Apply Boosting Based on a Field Value


(Maryam Abdullah) #1

Hey,

While searching, is it possible to boost the score by a number calculated from the field value, e.g. the field value divided by a particular number? Meaning that the boost value is dependent on a field value of the document.


(Byron Voorbach) #2

Take a look at the function score query; field value factor. It allows you to do exactly what you want :slight_smile:


(Maryam Abdullah) #3

Thank you!
I wonder if it can be used inside a bool in a SHOULD clause? Or do I need to wrap the bool with a function_score?


(Byron Voorbach) #4

You can use it in a bool, but it depends on your use-case.

Essentially the input for a function score query is a score that you would like to manipulate.

If you want to boost the complete query then you place the function score on top, if you want to apply it to a smaller part of the query, then you can wrap it in a bool.
When choosing the latter, other scores will then either be summed (boolean query) or a max is chosen (dismax).

(if this is too vague, let me know. I can work out an example :slight_smile: )


(Maryam Abdullah) #5

I'm actually using a bool with a MUST clause including a multi-match to match documents based on some fields, and I want to boost the scores of these matching documents if they happen to have a high number of views, so I believe this must be placed in a SHOULD clause.
In that case, if I placed it inside the SHOULD clause, will I get the logic I want?

Thanks:)


(Byron Voorbach) #6

You can combine a filter with the field value factor, which only applies the function to documents which match the filter. For example:

GET products/_search
{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must": [
            {
              "multi_match": {
                "query": "ps4",
                "fields": [
                  "title",
                  "description"
                ]
              }
            }
          ]
        }
      },
      "functions": [
        {
          "filter": {
            "range": {
              "views": {
                "lte": 1000
              }
            }
          },
          "field_value_factor": {
            "field": "rating",
            "factor": 5
          }
        },
        {
          "filter": {
            "range": {
              "views": {
                "get": 1000
              }
            }
          },
          "field_value_factor": {
            "field": "rating",
            "factor": 50
          }
        }
      ]
    }
  }
}

Notice the difference between the 2 functions :slight_smile:

If you place the function score in your should clause, the outcome of the function score will be added to the score of the MUST (+ other should clauses). If you want to have the score multiplied, then it should be on top


(Maryam Abdullah) #7

I see.
I guess I have to try both and see which one's the better based on my input.
But which one usually retrieves better results?


(Byron Voorbach) #8

That's very hard to say without seeing any docs/queries. If you can show off one of your queries it would be easier to help you pick.

In general it really depends on your use case :smiley:


(Maryam Abdullah) #9

The language of the docs is Arabic, so is the query.
The query could be a known person (political, sports, etc), a program, an episode, a country, etc..


(Maryam Abdullah) #10

Is it possible to affect the score inside the multi-match by a date range boost value with multiplication?
I know if I placed the date range boost inside a SHOULD in a BOOL, then the score will be affected by adding the boost value to it.
but what about multiplying the score? Do I need to use re-scoring and place the date range boost inside a SHOULD in a BOOL?


(Byron Voorbach) #11

Sorry I've been quite busy last few days.
It's a bit easier to help if you could post the code you have so far.

If I understand you correctly, you want to multiply the score of your clauses in your MUST clause by a date value that you have inside your document.
You can use gauss or linear decay functions for this.
If this is the case, you want to place your bool query inside the function score query, like as shown before with field value factor.


(Maryam Abdullah) #12

Here's my code:

 var response = elasticClient.Search<Document>(s => s
  .From(offset)
  .Size(size)
  .Index(index)
  .Sort(so => sortByDate
      ? so.Descending(a => a.Date).Field(f => f.Field("_score").Order(SortOrder.Descending))
      : so.Field(f => f.Field("_score").Order(SortOrder.Descending)))
  .Query(q => q
      .FunctionScore(fs => fs.Query(qy => qy.Bool (b => b
          .MustNot(mn => mn
              .Terms(t => t.Field(f => f.Client).Terms(other)))
           .Must(m => m
              .MultiMatch(mm => mm
                  .Fields(fd => fd
                      .Field(f => f.Title)
                      .Field(f => f.Description)
                      .Field(f => f.Tags)
                      .Field(f => f.Terms)
                      .Field(f => f.Type)
                      .Field(f => f.Meta)
                      .Field(f => f.Relations)
                  )
                  .Type(TextQueryType.CrossFields)
                  .TieBreaker(1)
                 .Operator(Operator.Or)
                 .Query(query)
                 .MinimumShouldMatch("80%"))
  )
  )
  )
  .BoostMode(FunctionBoostMode.Sum)
  .Functions(fn => fn.FieldValueFactor(fvf => fvf.Field(ff => ff.Importance).Factor(0.00033)))
  .MaxBoost(10)
    .MinScore(1.0)
  )
  )
  .Rescore(rs => rs.Rescore(r => r.RescoreQuery(rq => rq.ScoreMode(ScoreMode.Total).Query(q => q.Bool(b => b.Should(sh => sh
                  .DateRange(
                      rr =>
                          rr.Field(fd => fd.Date)
                              .GreaterThanOrEquals(
                                  DateMath.Now.Subtract(new DateMathTime(2628000000))).LessThanOrEquals(DateMath.Now)
                              .Boost(14)),
                              sh => sh
                  .DateRange(
                      rr =>
                          rr.Field(fd => fd.Date)
                              .GreaterThanOrEquals(
                                  DateMath.Now.Subtract(new DateMathTime(5256000000))).LessThan(
                                  DateMath.Now.Subtract(new DateMathTime(2628000000)))
                              .Boost(13))    
            )
            )
            ).QueryWeight(1)
            )
            )
            )

  .Highlight(h => h
      .PreTags("<b>")
      .PostTags("</b>")
      .Fields(f => f
              .Field(e => e.Title),
          f => f
              .Field(e => e.Description)
      )
  )

);

I probably need to change the date range boost as it's not practical.
Based on what you said, I need to add the Gauss function next to the Field Value Factor, and I still can decide whether I want the two values (resulting from field value factor and Gauss functions) be multiplied or added through the score mode, is that correct?
The boost mode tells us if the two scores (resulting from the bool and functions) will be either added or multiplied.


(Byron Voorbach) #13

I took the liberty of rewriting the query to something that is easier for me to work with :stuck_out_tongue:

GET /_search
{
  "from": 0,
  "size": 20,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must_not": [
            {
              "terms": {
                "client": [
                  "FILTER_TERMS"
                ]
              }
            }
          ],
          "must": [
            {
              "multi_match": {
                "query": "TERM",
                "type": "best_fields",
                "tie_breaker": 1,
                "operator": "or",
                "minimum_should_match": "80%",
                "fields": [
                  "title",
                  "description",
                  "tags",
                  "terms",
                  "type",
                  "meta",
                  "relations"
                ]
              }
            }
          ]
        }
      },
      "boost_mode": "sum",
      "max_boost": 10,
      "min_score": 1,
      "functions": [
        {
          "field_value_factor": {
            "field": "importance",
            "factor": 0.00033,
            "missing": 1
          }
        }
      ]
    }
  },
  "rescore": {
    "query": {
      "score_mode": "total",
      "rescore_query": {
        "bool": {
          "should": [
            {
              "range": {
                "date": {
                  "boost": 14,
                  "gte": "now-2628000000"
                }
              }
            },
            {
              "range": {
                "date": {
                  "boost": 13,
                  "gte": "now-5256000000 < now-2628000000"
                }
              }
            }
          ]
        }
      }
    },
    "window_size": 10
  },
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ],
  "highlight": {
    "pre_tags": "<b>",
    "post_tags": "</b>",
    "fields": {
      "title": {},
      "description": {}
    }
  }
}

Now if I understand you correctly, you would like to apply the boosts you have in rescore now to the score of the multi_match. Not sure if you choose rescore on purpose, but this only changes the score for the first 10 results (window_size setting).
I think you're looking for something more like this (correct me if I'm wrong):

GET /_search
{
  "from": 0,
  "size": 20,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "must_not": [
            {
              "terms": {
                "client": [
                  "FILTER_TERMS"
                ]
              }
            }
          ],
          "must": [
            {
              "multi_match": {
                "query": "TERM",
                "type": "best_fields",
                "tie_breaker": 1,
                "operator": "or",
                "minimum_should_match": "80%",
                "fields": [
                  "title",
                  "description",
                  "tags",
                  "terms",
                  "type",
                  "meta",
                  "relations"
                ]
              }
            }
          ]
        }
      },
      "boost_mode": "sum",
      "max_boost": 10,
      "min_score": 1,
      "functions": [
        {
          "field_value_factor": {
            "field": "importance",
            "factor": 0.00033,
            "missing": 1
          }
        },
        {
          "filter": {
            "range": {
              "date": {
                "gte": "now-2628000000"
              }
            }
          },
          "weight": 14
        },
        {
          "filter": {
            "range": {
              "date": {
                "gte": "now-5256000000 < now-2628000000"
              }
            }
          },
          "weight": 13
        }
      ]
    }
  },
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ],
  "highlight": {
    "pre_tags": "<b>",
    "post_tags": "</b>",
    "fields": {
      "title": {},
      "description": {}
    }
  }
}

I put your range queries into the function_score query as filters, combined with a weight so that the score will be higher for documents that match those filters. You will probably have to play around with the weight values, boost_mode, max_boost and find the right way to blend the scores.
Now the following boosts are in effect:

Always:

  • Boost based on importance field value

Conditional:

  • When range filter 1 matches, add boost of 14
  • When range filter 2 matches, add boost of 13

Hope this helps!


(Maryam Abdullah) #14

No, I had no idea the rescore only takes the 1st 10 results.
I will go with what you suggested, adding it inside the functions. However, I couldn't add a filter inside the functions, as it only takes one argument.
And is it practical to add like 12 date ranges as I'm concerned about the documents within the last year.


(Byron Voorbach) #15

I noticed now that you're actually summing the scores of the functions with the main query (boost_mode), if this is desired, then you could also place the range filters in your SHOULD. There are different ways of implementing the same functionality. Try out some things and get a feeling for what works best.

If you have 12 different date ranges and you want the score to linearly go down for older documents, then it's maybe better to take a linear decay function instead of 12 separate clauses. Weird that it's not possible, I know from the Java API that it should be.


(Maryam Abdullah) #16

I set the boost_mode to SUM just as an example, as I still need to test both and decide which one to use based on the results.
Aha, so it's possible to use Guass in a way to affect the scores based on the most recent ones that does exactly same as 12 separates clauses.

Thank you so much:)


(Byron Voorbach) #17

Ahh ok! Just double checking!

Yeah you can definitely do that + it's so much cleaner to look at :slight_smile:

Anytime :smiley:


(Maryam Abdullah) #18

Hey,

I hope you can help me with the following:
I used Gauss, however, I still have documents with DateTime.MaxValue (e.g. index pages, tags pages, term pages) that are important as well, so it's important to consider them. I added the following lines of code

 .GaussDate(b => b.Field(p => p.Date).Origin(DateMath.Now).Decay(0.5).Scale("140d").Filter(f => f.DateRange(dr => dr.LessThanOrEquals(DateMath.Now))))
 .GaussDate(b => b.Field(p => p.Date).Origin(DateTime.MaxValue).Decay(0.8).Scale("1d").Filter(f => f.DateRange(dr => dr.GreaterThanOrEquals(DateTime.MaxValue))))

But it's not working out, are we allowed to use 2 Gauss functions with filters?


(Byron Voorbach) #19

Hey!

You should be able to use multiple functions, each with their own filter.

What do you mean with not working out? Is the score lower than expected? Did you play around with score_mode and boost_mode?


(Maryam Abdullah) #20

I mean it returns no result.
Before adding the 2nd line of code, everything seemed normal.