Documents with german umlauts

astropanic · August 2, 2017, 11:51am

I have two documents:

{"name": "Drucker"}
{"name": "Drücker"}

How I should index it and how the query should be build so I can:

a) find both documents querying for "drucker"
b) sort the documents according to the search query (the searched document should appear before the others)

Regards,
Wojciech

dadoonet · August 2, 2017, 12:16pm

Using an asciifolding token filter would probably help here.

See https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-asciifolding-tokenfilter.html

Ivan · August 2, 2017, 5:52pm

The asciifolding filter will normalize the extended characters so that
those words are equivalent. It was solve your first case, but not the
second. ICU collation might help with the latter, but sorting would be
language specific and not based on the query:

https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-collation-keyword-field.html

system · August 30, 2017, 5:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ways to handle umlauts Elasticsearch	2	4203	July 28, 2017
U-umlaut search --> indexing user name müller , search fails for müller but success for muller Elasticsearch	6	6193	July 5, 2017
Analyze German words with umlauts Elasticsearch	3	4210	July 5, 2017
Is umlaut expansion such as ü -> [ü, u, ue] possible with built in es tokenizer/filters? Elasticsearch	1	619	March 9, 2019
Folding German characters like umlauts Elasticsearch	11	4164	July 6, 2017

Documents with german umlauts

Related topics