Lemmatizer for Italian and English languages for ES 2.3.4


(valerio) #1

I would like to use a lemmatizer for "italian" and "english" languages. Which plugin should i be installing inside my ES?

Thx Rgds valerio


(Daniel Mitterdorfer) #2

Hi @valerioorfano,

Did you mean a stemmer? Stemmers for English and Italian are part of Elasticsearch (among many others, see reference docs).

Daniel


(valerio) #3

Hi Dainiel and thanx for ur reply.

Actually i mean lemamtizer that is different from stemmer.

I want something that translate :

am , are , is, was , were => be (for en)

vado, vai ,vanno => andare (for it)

I'm searching for an opensource api that works in ES 2.3.4

Any idea?


(Daniel Mitterdorfer) #4

Hi @valerioorfano,

your best bet is probably the LemmaGen Analysis plugin (I have no prior experience though with this plugin). It does not support Italian but it might be possible to reuse the models from the related LemmaGen project which has support for Italian.

However, let me cite from the Definitive Guide:

Lemmatization is a much more complicated and expensive process that needs to understand the context in which words appear in order to make decisions about what they mean. In practice, stemming appears to be just as effective as lemmatization, but with a much lower cost.

Daniel


(valerio) #5

Thx a lot Daniel

i will give it a try


(Vojtech Hyza) #6

Please note that the license is unfortunately quite restrictive https://github.com/vhyza/elasticsearch-analysis-lemmagen#lexicons-license :frowning:


(system) #7