What is the difference between Snowball & Stemming?


(Eric GeLo) #1

Hey!

I'm implementing a search process using ElasticSearch which currently use
the snowball token filter (French). I have take a look to the Stemmer token
filter which seems to do the same. Someone can explain what is the
difference between the stemmer token filter & the snowball token filter and
so, what is the difference between these stemmer configurations: french,
light_french, minimal_french?

Thanks for your time!

--


(Jörg Prante) #2

Hi Eric,

there are three french stemmers available in Lucene:

Rule of thumb: use the Porter stemmer if you want to stem as many words as
possible and tolerate stemming errors (wrong stemming, "overstemming")

Rule of thumb: use this stemmer if you want a specifically designed stemmer
for french with the help of a stopword list

Rule of thumb: use this stemmer if you prefer statistical methods that
should keep good retrieval quality with even less stemmed words

Cheers, Jörg

On Friday, October 26, 2012 4:17:04 PM UTC+2, Eric GeLo wrote:

Hey!

I'm implementing a search process using ElasticSearch which currently use
the snowball token filter (French). I have take a look to the Stemmer token
filter which seems to do the same. Someone can explain what is the
difference between the stemmer token filter & the snowball token filter and
so, what is the difference between these stemmer configurations: french,
light_french, minimal_french?

Thanks for your time!

--


(system) #3