Hi all !
I search by a text field and want to get a similarity score for the entire phrase.
Algorithm - Levenshtein distance.
The result should be normalized in the range 0 - 1
text in query:
expected relevance score in the response:
* "big hat": 1.0 * "not big hat": 0.7 * "big black hat": 0.6
I've already figured out some of the limitations of ES:
- max fuzziness value for match query = 2
- if we don't have the same text in ES documents as in the query, we can't understand reference result (with score 1.0)
- TF/IDF similarity works with tokens, not with the entire phrase and takes into account the general occurrence of the token in the index
Maybe there are some things to try.
Will be glad to any comments,