Levenshtein distance of 2 handles same Token differently?

as_ES · September 1, 2020, 9:07am

Hi,

I created two indices with the exact same mapping. The specific field has an EdgeNGram Tokenizer (Min:1 Max:50, keeping only Letters and Digits) and an analyzer with "lowercase", "asciifolding" and "synonym" filters.

When using a multi_match query with fuzziness "AUTO:4,7":

field containing "Hamburg-Strasse" matches Hamburg, Hambur, Hambu

But:

field containing only "Hamburg" matches only Hamburg, Hambur

Would be thankful for any ide, what could be causing this.

Edit: Figured out its the max-expansions setting, still not understanding it completely.

Regards Alex

system · September 29, 2020, 9:07am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fuzziness and Levenstein distance Elasticsearch eql-elastic-query-language	3	130	February 23, 2024
Levenshtein ratio query Elasticsearch	1	558	January 19, 2020
Fuzzy not matching with two substitutions (distance=2) Elasticsearch	3	416	November 22, 2019
Weired behaviour of fuzziness in elasticsearch Elasticsearch	3	490	March 22, 2023
Combining Fuzziness (Levenshtein distance) of multiple fields into one Elasticsearch	2	494	July 6, 2018

Levenshtein distance of 2 handles same Token differently?

Related topics