Normalizing MLT score

Atharva_Patel · November 30, 2012, 7:43am

I am using MLT queries to find out similar documents. I have a case where I
would like to set a threshold on the score for deciding which documents
should be considered as similar to the given document passed in the like
text.

In the response hits I am observing the scores ranging from 0 to 2.5. The
2.5 is the upper limit of the few test cases that I have considered while
in development. In production it may even go higher! Therefore I am
interested in knowing if there is a way to normalize the score to bring
them between 0 and 1. Naive strategy of dividing each hit score by max
score at the client side will be useless as it will produce score 1.0 for
the first hit(the one with highest score) in the ranked hits, so it will
always pass the threshold (say 0.3).

It can be also useful if I can some how predict the highest possible score
on my MLT query based on some internal formula being used by MLT for
scoring.

Can somebody please help me with these approaches?

Thanks!

--

Topic		Replies	Views
Getting similarity scores by issuing MLT-queries doesn't work for some documents Elasticsearch	4	424	May 18, 2020
ElasticSearch normalized the score for each document Elasticsearch	2	1558	April 13, 2017
Compare multiple MLT queries score Elasticsearch	1	814	February 5, 2017
How to normalize similarity score using 'more_like_this' Elasticsearch	1	503	October 22, 2020
Questions about MoreLikeThis Elasticsearch	3	476	July 6, 2017

Normalizing MLT score

Related topics