German stemmer - looking for snowball alternative


(ThomasL) #1

As we are not satisfied with the german snowball stemmer we are looking for
alternatives.
For example we miss stemming for some plural variants like: Kiwis --> Kiwi
/ Autos --> Auto and Nudeln --> Nudel etc.

We found out about Lucene's
GermanLightStemmer, see
http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/de/GermanLightStemmer.html

and think this might be an alternative. At least we hope for… :wink:

I tried to use it in my elasticsearch settings, but without success so far.
Searching for "elasticsearch" and "GermanLightStemmer" results in too few
results either ;-/

Any hints how to use this stemmer in elasticsearch would really be
appreciated.
Also thanks in advance for infos about other alternative german stemmers
which can be used in elasticsearch and which are good at plural/singular
stemming.

Cheers,

Thomas

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

The german light stemmer's name is 'light_german' and documented at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(ThomasL) #3

:)))

Thanks, trying this soon… :sunglasses:

Am Mittwoch, 16. Oktober 2013 14:41:05 UTC+2 schrieb Jörg Prante:

The german light stemmer's name is 'light_german' and documented at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(ThomasL) #4

Thanks again, Jörg.

Tested short the "german light stemmer" does not work on "Autos/Auto" or
"Nudeln/Nudel".
What's the best approach to achieve this?

Using a "stemmer override token filter" as described below:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-override-tokenfilter.html

?

Thanks again for all hints!

Thomas

Am Mittwoch, 16. Oktober 2013 14:41:05 UTC+2 schrieb Jörg Prante:

The german light stemmer's name is 'light_german' and documented at
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #5

The best approach is to write a baseform plugin for german that is better
than simple algorithmic stemming :slight_smile:

Fortunately, I have started one and have just released 1.0.0 which is based
on a lexicon.

Maybe you like to give it a try.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6