Documents with german umlauts


(Wojciech Pietrzak) #1

I have two documents:

  1. {"name": "Drucker"}
  2. {"name": "Drücker"}

How I should index it and how the query should be build so I can:

a) find both documents querying for "drucker"
b) sort the documents according to the search query (the searched document should appear before the others)

Regards,
Wojciech


(David Pilato) #2

Using an asciifolding token filter would probably help here.

See https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-asciifolding-tokenfilter.html


(Ivan Brusic) #3

The asciifolding filter will normalize the extended characters so that
those words are equivalent. It was solve your first case, but not the
second. ICU collation might help with the latter, but sorting would be
language specific and not based on the query:

https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-collation-keyword-field.html


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.