Limiting the relevancy score of all the searched documents

(Divyanshu Marwah) #1

I want to limit the score of all the searched documents within 0 and 1.Please suggest a way to do this.

(Mark Walkom) #2

Why would you want to do this?
And do you want to filter everything out that has a score <0 and >1?

(Divyanshu Marwah) #3

No, Actually I want that all the scores should come in between 0-1.I don't want to filter the results according to score, rather I want to limit all the scores <0 and >1. Normally there is no limit of the scores that are calculated, so in order to get some uniform results, I want all the matching results of the query , but all the scores should be limited within 0 to 1.

(Mark Walkom) #4

You cannot do this.

(Divyanshu Marwah) #5

ohh.....can it be done using a custom scoring function?

(Mark Walkom) #6

No idea.

But, again, why do you want to do this? Scoring is relevant to other documents, if you try to force this into a specific range then you are influencing the score and changing your results.

Does the score calculation admit a maximum?
(Divyanshu Marwah) #7

as per the requirements of the project, recommendations are to be made based on the score. And all the matched documents scores are to be displayed which should be in some proper range.

(rpsandiford) #8

I think you would end up with false relevance values.

Suppose, against the same index, you run 2 searches. The range of relevance scores from the first search range from .0001 through 1.5000, and you scale that so that it's between 0 and 1. For the second search, suppose the relevance score range is between 0.0500 and 3.5000, and you scale that from 0 through 1. Now you have two result sets – and you might think that the highest scored result from each of the searches, now having a relevance score of 1, have the same value – but they don't. You have artificially scaled your two individual result set relevance ranges into the same 0..1 range. You've lost the original relative scorings that would let you actually compare the top result of each of the searches, because the scaling you apply to get rankings between 0 and 1 isn't constant across all searches.

Bob Sandiford | Principal Engineer | SirsiDynix

(Bruce Ritchie) #9

The project requirements are bogus. I've seen this in the past where project managers push for consistent scoring between 0 and 1. You just need to tell them that isn't how it works and forcing it to work (scaling) will result in poor search result ordering.

(Doug Turnbull) #10

You could divide every score by the max score, outside the search engine. But scores won't be comparable search to search.

But generally these are bogus requirements. You can tell them the author of "Relevant Search" said that :-p. Here's a couple of primers on relevance scoring

  • Search engine scoring is based on TF*IDF, which is documented thoroughly in these Java docs
  • Pretty soon, I believe starting in Elasticsearch 5.0, BM25 will be the default.
  • Relevance scores between fields are not comparable

Hope that helps

(Chris Earle) #11

Yep, starting in 5.0.

(system) closed #12