Is there a way to score documents so that the relevance score has a fixed
range, like from 0 to 1.0 ? The default scoring can return arbitrarily high
scores, depending on how many times the matching term appears in the
document.
It's tempting to want to normalize the score by the top-matching document,
but this is wrong since the top document isn't always a perfect match.
Are there other built-in scorers, or parameter settings that will do this?
On Wednesday, November 5, 2014 8:42:59 PM UTC+1, Dustin Boswell wrote:
Is there a way to score documents so that the relevance score has a fixed
range, like from 0 to 1.0 ? The default scoring can return arbitrarily high
scores, depending on how many times the matching term appears in the
document.
It's tempting to want to normalize the score by the top-matching document,
but this is wrong since the top document isn't always a perfect match.
Are there other built-in scorers, or parameter settings that will do this?
Glad to know lots of other people have been asking for it too
I agree that dividing the default relevance score by some constant (or some
number derived from the results) is a bad idea, for all the reasons that
article describes.
I was hoping there was a non-default scorer that is built to return 0-1.0
scores by design. At my company we have a home-grown search engine that
returns relevance scores in this range, and it works great. (Maybe I could
discuss the algorithm further with the team offline, it's pretty good.)
We're looking to use elasticsearch for some of our applications, and this
feature would help.
I guess I could go down the road of writing a custom scoring algorithm (in
Java?) but not sure how much of an undertaking that is...
On Thursday, November 6, 2014 11:11:23 AM UTC-8, simonw wrote:
On Wednesday, November 5, 2014 8:42:59 PM UTC+1, Dustin Boswell wrote:
Is there a way to score documents so that the relevance score has a fixed
range, like from 0 to 1.0 ? The default scoring can return arbitrarily high
scores, depending on how many times the matching term appears in the
document.
It's tempting to want to normalize the score by the top-matching
document, but this is wrong since the top document isn't always a perfect
match.
Are there other built-in scorers, or parameter settings that will do this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.