Is there any french lemmatizer available for ElasticSearch?

mgaudin · April 27, 2017, 8:02am

Hi there !

My folks and I are encountering a problem with our current analyzer. Indeed, we analyze user-generated content and the stemming process generates many false positives.
For instance, the company "Servier" stems to "servi" which matches the word "Service". To avoid that, we would like to use a dictionnary based lemmatizer but I did not manage to find one.

Is there any french lemmatizer I can use (in production) with ES ?

Thanks !

nik9000 · April 27, 2017, 2:52pm

I don't know of any! If you are willing to fiddle with it you can recreate the french analyzer using the code here and then add a stemmer_override filter to prevent the company from being stemmed.

mgaudin · April 27, 2017, 3:21pm

Hey !

Thanks for your answer unfortunatly we would rather not use term exclusion since we would have to exclude a lot of proper nouns. It is probably possible but not very reallistic nor maintenable

system · May 25, 2017, 3:31pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Lemmatizer for Italian and English languages for ES 2.3.4 Elasticsearch	6	2057	July 5, 2017
Does ElasticSearch support lemmatization? If yes, then how can I search for docs using this? Elasticsearch	1	572	January 5, 2024
Possessive_english stemmer ignoring proctected keywords in Elasticsearch Elasticsearch	0	154	April 15, 2024
Stemmer token filter result is different that it should be Elasticsearch	2	373	July 6, 2017
Problème with french stemmer Elasticsearch	5	561	July 6, 2017

Is there any french lemmatizer available for ElasticSearch?

Related topics