What is the difference between Snowball & Stemming?

Hey!

I'm implementing a search process using ElasticSearch which currently use
the snowball token filter (French). I have take a look to the Stemmer token
filter which seems to do the same. Someone can explain what is the
difference between the stemmer token filter & the snowball token filter and
so, what is the difference between these stemmer configurations: french,
light_french, minimal_french?

Thanks for your time!

--

Hi Eric,

there are three french stemmers available in Lucene:

Rule of thumb: use the Porter stemmer if you want to stem as many words as
possible and tolerate stemming errors (wrong stemming, "overstemming")

Rule of thumb: use this stemmer if you want a specifically designed stemmer
for french with the help of a stopword list

  • the light stemmer, a statistical approach of stemming applicable for
    several languages, based on the algorithm of Jacques Savoy "Light Stemming
    Approaches for the French, Portuguese,
    German and Hungarian Languages"
    Attention - RERO DOC

Rule of thumb: use this stemmer if you prefer statistical methods that
should keep good retrieval quality with even less stemmed words

Cheers, Jörg

On Friday, October 26, 2012 4:17:04 PM UTC+2, Eric GeLo wrote:

Hey!

I'm implementing a search process using Elasticsearch which currently use
the snowball token filter (French). I have take a look to the Stemmer token
filter which seems to do the same. Someone can explain what is the
difference between the stemmer token filter & the snowball token filter and
so, what is the difference between these stemmer configurations: french,
light_french, minimal_french?

Thanks for your time!

--