Is there any french lemmatizer available for ElasticSearch?

Hi there !

My folks and I are encountering a problem with our current analyzer. Indeed, we analyze user-generated content and the stemming process generates many false positives.
For instance, the company "Servier" stems to "servi" which matches the word "Service". To avoid that, we would like to use a dictionnary based lemmatizer but I did not manage to find one.

Is there any french lemmatizer I can use (in production) with ES ?

Thanks !

I don't know of any! If you are willing to fiddle with it you can recreate the french analyzer using the code here and then add a stemmer_override filter to prevent the company from being stemmed.

Hey !

Thanks for your answer unfortunatly we would rather not use term exclusion since we would have to exclude a lot of proper nouns. It is probably possible but not very reallistic nor maintenable :frowning:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.