Limiting the relevancy score of all the searched documents

divyanshu_marwah · April 22, 2016, 6:53am

I want to limit the score of all the searched documents within 0 and 1.Please suggest a way to do this.

warkolm · April 24, 2016, 12:23am

Why would you want to do this?
And do you want to filter everything out that has a score <0 and >1?

divyanshu_marwah · April 25, 2016, 6:02am

No, Actually I want that all the scores should come in between 0-1.I don't want to filter the results according to score, rather I want to limit all the scores <0 and >1. Normally there is no limit of the scores that are calculated, so in order to get some uniform results, I want all the matching results of the query , but all the scores should be limited within 0 to 1.

warkolm · April 25, 2016, 6:11am

You cannot do this.

divyanshu_marwah · April 25, 2016, 6:15am

ohh.....can it be done using a custom scoring function?

warkolm · April 25, 2016, 7:04am

No idea.

But, again, why do you want to do this? Scoring is relevant to other documents, if you try to force this into a specific range then you are influencing the score and changing your results.

divyanshu_marwah · April 25, 2016, 7:16am

as per the requirements of the project, recommendations are to be made based on the score. And all the matched documents scores are to be displayed which should be in some proper range.

rpsandiford · April 25, 2016, 12:33pm

I think you would end up with false relevance values.

Suppose, against the same index, you run 2 searches. The range of relevance scores from the first search range from .0001 through 1.5000, and you scale that so that it's between 0 and 1. For the second search, suppose the relevance score range is between 0.0500 and 3.5000, and you scale that from 0 through 1. Now you have two result sets – and you might think that the highest scored result from each of the searches, now having a relevance score of 1, have the same value – but they don't. You have artificially scaled your two individual result set relevance ranges into the same 0..1 range. You've lost the original relative scorings that would let you actually compare the top result of each of the searches, because the scaling you apply to get rankings between 0 and 1 isn't constant across all searches.

Bob Sandiford | Principal Engineer | SirsiDynix

Bruce_Ritchie · April 25, 2016, 1:57pm

The project requirements are bogus. I've seen this in the past where project managers push for consistent scoring between 0 and 1. You just need to tell them that isn't how it works and forcing it to work (scaling) will result in poor search result ordering.

softwaredoug · April 25, 2016, 3:13pm

You could divide every score by the max score, outside the search engine. But scores won't be comparable search to search.

But generally these are bogus requirements. You can tell them the author of "Relevant Search" said that :-p. Here's a couple of primers on relevance scoring

Search engine scoring is based on TF*IDF, which is documented thoroughly in these Java docs
Pretty soon, I believe starting in Elasticsearch 5.0, BM25 will be the default.
Relevance scores between fields are not comparable

Hope that helps

pickypg · April 25, 2016, 7:44pm

Yep, starting in 5.0.

Topic		Replies	Views
Relevance in the range 0.0 to 1.0? Elasticsearch	3	1369	July 6, 2017
ElasticSearch normalized the score for each document Elasticsearch	2	1558	April 13, 2017
Score interval Elasticsearch	3	346	April 17, 2018
Filter search documents by score Elasticsearch	8	3580	December 28, 2016
Max_score greater than 1 Elasticsearch	2	6392	July 5, 2017

Limiting the relevancy score of all the searched documents

Related topics