Get text similarity by Levenstein distance in range 0-1

Alex_D · July 16, 2020, 7:13am

Hi all !

I search by a text field and want to get a similarity score for the entire phrase.
Algorithm - Levenshtein distance.
The result should be normalized in the range 0 - 1

Example:
text in query:

"big hat"

expected relevance score in the response:

* "big hat":            1.0
* "not big hat":        0.7
* "big black hat":      0.6

I've already figured out some of the limitations of ES:

max fuzziness value for match query = 2
if we don't have the same text in ES documents as in the query, we can't understand reference result (with score 1.0)
TF/IDF similarity works with tokens, not with the entire phrase and takes into account the general occurrence of the token in the index

Maybe there are some things to try.

Will be glad to any comments,
thanks

Alex_D · July 16, 2020, 11:50am

Maybe another similarity module ?
I use BM25 (by default)

system · August 13, 2020, 12:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Score based on Levenshtein distance in the results Elasticsearch	1	917	June 30, 2017
Return Levenshtein distance in fuzzy query Elasticsearch	1	461	July 6, 2017
Levenshtein ratio query Elasticsearch	1	560	January 19, 2020
Fuzzy query scoring based on levenshtein distance Elasticsearch	4	2680	July 6, 2017
Query regarding scoring of ES8. 2 Elasticsearch	4	241	October 5, 2022

Get text similarity by Levenstein distance in range 0-1

Related topics