Hello,
I have a problem using the german2 stemmer on hyphenated compound words with Elasticsearch.
As an example I have 2 words: "Export-Schnittstelle" and "Schnittstelle", for these words the stemmer creates "Export-Schnittstell" or "Schnittstell" respectively, which is great because with the right tokenization I can now search for "Schnitstelle" (which the stemmer within my search analyzer will transform to "Schnittstell") and it will match the second part from the word "Export-Schnittstelle" aka "Export-Schnittstell".
Now I would expect that this is how it works for all hyphenated compound words. But unfortunately that's not the case. So I now have 2 other words "PA-Schiene" and "Schiene". Here the stemmer creates two completely different words: "PA-Schi" and "Schien".
Can someone explain to my why this is and if there is a way to fix this? Maybe by using different stemming, like light_german oder minimal_german?
Thanks in advance.
Best Regards
Simon