Levenshtein distance of 2 handles same Token differently?


I created two indices with the exact same mapping. The specific field has an EdgeNGram Tokenizer (Min:1 Max:50, keeping only Letters and Digits) and an analyzer with "lowercase", "asciifolding" and "synonym" filters.

When using a multi_match query with fuzziness "AUTO:4,7":

  • field containing "Hamburg-Strasse" matches Hamburg, Hambur, Hambu


  • field containing only "Hamburg" matches only Hamburg, Hambur

Would be thankful for any ide, what could be causing this.

Edit: Figured out its the max-expansions setting, still not understanding it completely.

Regards Alex

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.