I have started a lexicon-based analyzer for linguistic processing of full
word forms to their base form (right now, only german lexicon is provided)
With this plugin, full word forms are reduced to base forms in the
tokenization process. This is also known as lemmatization.
Why is lemmatization better than stemming? With this plugin, you can
generate additional baseform tokens also for irregular word forms. Example:
for the word "zurückgezogen", the base form is "zurückziehen". Algorithmic
stemming would be rather limited for such cases.
Thanks to Dawid Weiss for the FSA and Daniel Naber for the german
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
For more options, visit https://groups.google.com/groups/opt_out.