I'm thinking of switching from the Porter stemmer to KStem for English
because it seems to do a better job for my rather limited test set. Does
anyone have an opinion on which stemmer seems to do a better job?
if I remember correctly, both Porter and English Snowball can be quite
aggressive in stemming. We are using KStem. YMMW
Regards,
Lukas
On Tue, Oct 1, 2013 at 2:57 PM, Nikolas Everett nik9000@gmail.com wrote:
I'm thinking of switching from the Porter stemmer to KStem for English
because it seems to do a better job for my rather limited test set. Does
anyone have an opinion on which stemmer seems to do a better job?
Thanks so much. I did end up switching to KStem. I've only seen one
complaint, around the last name "Duhring", but I'm pretty sure Porter
wouldn't have been any better there.
I'll too be moving over to kstem. In addition to being a bit less aggressive, kstem turns all words into other real words (porter often comes out with word-chunks). So you can use the analyzed result to generate word clouds or other types of aggregations.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.