Fuzziness and analysis

martinm · February 9, 2018, 10:26am

Hello,

I want to search a field containing country names which may be in 4 different languages (English, German, French and Italian) and the search should accept some typing mismatch (which I thought of solving using fuzziness). I.e. the search strings "Italy" (English), "Italien" (German), "Italie" (French), "Italia" (Italian) and "Itali" (mis-typed French version) should all match "Italy".

My first thought was to use a character filter to normalize the country name (i.e. setting the equivalence of Italy~Italien~Italie~Italia) in the index definition and to use a match query with fuzziness to cope with misspellings. However as I understand from the documentation (https://www.elastic.co/guide/en/elasticsearch/guide/current/fuzzy-match-query.html), the query string is first analyzed before the search terms are fuzzified. Therefore I am afraid my approach would not work, as I try to illustrate with the following example, assuming I use the English country name as the normalized version.

The search string "Detschland" (miss-spelled "Deutschland", German for Germany) will not be recognized as "Deutschland" during analysis and therefore not be transformed.
The Levenshtein distance between "Detschland" and the term "Germany" in the inverted index is too large for a match, although the Levenshtein distance between "Detschland" and the correct "Deutschland" would be only 1.

If there was a way to first fuzzify the search string and than run analysis on the resulting terms, "Deutschland" would appear as one of the fuzzified versions of "Detschland" and would subsequently be normalized to "Germany" in the analysis step and therefore lead to a match.

Is there a way to change the order of analysis / fuzzification or is there another approach in elastic search to solve my functional requirement?

thank you
with kind regards
Martin

system · March 9, 2018, 10:26am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fuzziness and Levenstein distance Elasticsearch eql-elastic-query-language	3	133	February 23, 2024
Fuzziness not working as expected Elasticsearch	2	1097	July 5, 2017
How to improve search results with fuzziness on? Elasticsearch	4	786	July 5, 2017
Elasticsearch Fuzzy Search does not work sometimes for correctly spelled words Elasticsearch	1	991	January 3, 2018
Using fuzzy (Levenshtein) in filter Elasticsearch	5	1864	July 5, 2017

Fuzziness and analysis

Related topics