I would like to use a lemmatizer for "italian" and "english" languages. Which plugin should i be installing inside my ES?
Thx Rgds valerio
I would like to use a lemmatizer for "italian" and "english" languages. Which plugin should i be installing inside my ES?
Thx Rgds valerio
Hi @valerioorfano,
Did you mean a stemmer? Stemmers for English and Italian are part of Elasticsearch (among many others, see reference docs).
Daniel
Hi Dainiel and thanx for ur reply.
Actually i mean lemamtizer that is different from stemmer.
I want something that translate :
am , are , is, was , were => be (for en)
vado, vai ,vanno => andare (for it)
I'm searching for an opensource api that works in ES 2.3.4
Any idea?
Hi @valerioorfano,
your best bet is probably the LemmaGen Analysis plugin (I have no prior experience though with this plugin). It does not support Italian but it might be possible to reuse the models from the related LemmaGen project which has support for Italian.
However, let me cite from the Definitive Guide:
Lemmatization is a much more complicated and expensive process that needs to understand the context in which words appear in order to make decisions about what they mean. In practice, stemming appears to be just as effective as lemmatization, but with a much lower cost.
Daniel
Thx a lot Daniel
i will give it a try
Please note that the license is unfortunately quite restrictive https://github.com/vhyza/elasticsearch-analysis-lemmagen#lexicons-license
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.