Prioritising matches in specific field


(Basiclaser) #1

Hi all.
Im trying to clarify what the current standard for prioritising document matches with matches from a certain field.

Questions ( I don't expect answers to them all, just wanted to get my different lines of enquiry in one place ):

  • would you use a field boost, and function score, or a bool>must/should query to prioritise results from a certain field?
  • I see that _boost was deprecated, and now 'boost' is available in 2.x. are they two different strategies with the same name?
  • would 'field_name^2' suffice to favor results that match in a particular field ( the lastest 'boost' docs show this )?
  • how can i return results in order of score?
  • can i return results in order with matches in a certain field first?
  • Are 'function score' and 'score function' the same thing?
    I saw mention of both.
  • Is a bool/must/should also suitable for prioritising matches from a certain field?
    Thanks for your help in clearing this up.

(Nik Everett) #2

Personally, I'd use field boost because it feels more search-y. Say you boost title 10 compared to text. It is still possible for something with a wonderful text match to outrank a poor title match. This comes up more if you happen to do both match_phrase and match at the same time. Like, if you have a boost on the match_phrase so that phrase matches float to the top. Then, depending on the boost on the match_phrase a phrase match in the text can beat a regular match in the title.

Yes. This mirrors a change in Lucene where boost was removed from queries a boost query was added to replace it.

That syntax working in multi_match and query_string queries and probably more I don't remember off the top of my head. Under the covers the those queries are building bool queries and boost queries.

The preference won't be hard and fast. If a term is twice as good a match against foo its going to beat a half as good match of bar^2. "Goodness" of a match is a bit complicated to explain though.

Its the default. Here are the docs for _sort. If you don't send that field the default is sorting by score descending.

Probably. But its usually better to use score.

I suspect you'd do something like

{
  "query": {
    "function_score": {
      "functions": {
        "filter": {
          "match": {"foo": "some text"},
          "weight": 1
        },
        "filter": {
          "match": {"bar": "some text"},
          "weight": 2
        }
      }
      "score_mode": "sum"
    }
  }
}

But that'd put all results that contained the bar match above the foo matches, all tied. Its probably not what you want, but you can play with it. I think its madness compared to just using field boosts, but if you want it, it is there for you. And it probably is the right thing to do in some rare situations.

function_score is a query. "score functions" are the functions you can feed to it. They are the functions field in the dsl of function_score.

Sure. You can use the boosting query to make one of the must or should clauses strong than another one. You can use the ^ syntax on the field names if you are using multi_match too.


(Ivan Brusic) #3

Great explanation by Nik and I would like to add that using a function
score makes it more difficult to debug the score explanation, which is
useful when tuning relevancy.

Ivan


(Basiclaser) #4

Thanks so much as always nik9000 for this in-depth response! This will come in handy.


(system) #5