Analyze German words with umlauts

usarskyy · August 20, 2015, 9:14am

Hello everyone!

I have a German word with umlaut, lets say it is "läuft". My target is to create an analyzer that produces three tokens at the end: "läuft", "laeuft" and "lauft".

I have tried different combinations with icu_normalizer, asciifolding and snowball for German2 filters but no results. The best result I've got from asciifolding token filter that emits two out of three required tokens: "läuft" and "lauft".

So, basically, I need to create some kind of custom asciifolding filter for German language that will allow to emit additional variations for words with umlauts.

My configuration for asciifolding and snowball filters are the following:

"ascii2": {
              "type": "asciifolding",
              "preserve_original": "true"
            },

"snow-german2": {
              "type": "snowball",
              "language": "German2"
            },

I would be really appreciated for your help!

francoisguerin · August 20, 2015, 4:14pm

Hi,
You should try the Combo analyzer plugin : https://github.com/yakaz/elasticsearch-analysis-combo/
it can combine multiple analyzers. For example, the one you mentioned (läuft => läuft, lauft) and another one (läuft => laeuft), with a regexp (char mapping or pattern replace).

usarskyy · August 20, 2015, 11:30pm

Yes, we came up to the same conclusion on SO (http://stackoverflow.com/questions/32114129/elasticsearch-analyzer-for-german-language). It seems to be the only possible solution for now.

Thank you for your advise!

Topic		Replies	Views
Ways to handle umlauts Elasticsearch	2	4203	July 28, 2017
U-umlaut search --> indexing user name müller , search fails for müller but success for muller Elasticsearch	6	6192	July 5, 2017
Is umlaut expansion such as ü -> [ü, u, ue] possible with built in es tokenizer/filters? Elasticsearch	1	619	March 9, 2019
Documents with german umlauts Elasticsearch	3	2298	August 30, 2017
Index analyzer problem with accent! Elasticsearch	1	337	July 6, 2017

Analyze German words with umlauts

Related topics