Fuzziness above 2 distance or 60% levenshtein match

(Amol Hegana) #1

Right now fuzziness covers only till 2 distance. Can we increase it?
I have keywords stored in elastic let take one "theeventandfood", I am reading ocr lines where i am expecting errors and that allowed error percentage is 60%. Based on this 60% plus match i want suggest keywords.

I am expecting approximate match using levenshtein formula.
examples above keyword should be suggested if ocr reads following

Already tried Fuzziness and ngram.

Fuzziness has limit of max distance 2 where as ngram analyzer not working for me as I need settings based on length of keyword read from ocr. I am not getting what should be the correct value for min and max gram.


(Simon Willnauer) #2

we only support LD/Fuzziness <= 2 the reason is that the underlying implementation would explode if we'd allow more. A dynamic programming approach of implementing it would be dramatically slower and be linear to the number of terms. so, no we can't increase it.

(system) closed #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.