What is the difference between Snowball & Stemming?

Eric_GeLo · October 26, 2012, 2:17pm

Hey!

I'm implementing a search process using ElasticSearch which currently use
the snowball token filter (French). I have take a look to the Stemmer token
filter which seems to do the same. Someone can explain what is the
difference between the stemmer token filter & the snowball token filter and
so, what is the difference between these stemmer configurations: french,
light_french, minimal_french?

Thanks for your time!

--

jprante · October 26, 2012, 11:53pm

Hi Eric,

there are three french stemmers available in Lucene:

the default french stemmer, based on Martin Porter's Snowball algorithm
Snowball: A language for stemming algorithms which is purely
algorithmic (no use of a dictionary), see
also http://www.lirmm.fr/~mroche/Recherche/Articles/Porter/porter.pdf

Rule of thumb: use the Porter stemmer if you want to stem as many words as
possible and tolerate stemming errors (wrong stemming, "overstemming")

the minimal stemmer, based on Jaqcues Savoy's 1999 algorithm "A STEMMING
PROCEDURE AND STOPWORD LIST FOR GENERAL FRENCH CORPORA"
http://members.unine.ch/jacques.savoy/papers/frjasis.pdf

Rule of thumb: use this stemmer if you want a specifically designed stemmer
for french with the help of a stopword list

the light stemmer, a statistical approach of stemming applicable for
several languages, based on the algorithm of Jacques Savoy "Light Stemming
Approaches for the French, Portuguese,
German and Hungarian Languages"
Attention - RERO DOC

Rule of thumb: use this stemmer if you prefer statistical methods that
should keep good retrieval quality with even less stemmed words

Cheers, Jörg

On Friday, October 26, 2012 4:17:04 PM UTC+2, Eric GeLo wrote:

Hey!

I'm implementing a search process using Elasticsearch which currently use
the snowball token filter (French). I have take a look to the Stemmer token
filter which seems to do the same. Someone can explain what is the
difference between the stemmer token filter & the snowball token filter and
so, what is the difference between these stemmer configurations: french,
light_french, minimal_french?

Thanks for your time!

--

Topic		Replies	Views
Difference between snowball stemmer and regular stemmer Elasticsearch	1	832	March 24, 2021
Stemmer token filter result is different that it should be Elasticsearch	2	373	July 6, 2017
Using the Snowball stemmers Elasticsearch	2	284	July 6, 2017
Language vs Snowball analyzer Elasticsearch	5	611	July 6, 2017
Stop words not used by the analyzer Elasticsearch	5	613	July 6, 2017

What is the difference between Snowball & Stemming?

Related topics