Porter2 Stemmer is just the Porter Stemmer?

The documentation for the "Stemmer" filter indicates that porter2 is an
available option:
http://www.elasticsearch.org/guide/reference/index-modules/analysis/stemmer-tokenfilter.html

However, I think there may be a bug here because I think "porter2" may just
map to the porter stemmer. I tried stemming a word both on the porter and
porter2 stemmers. Both stemmed the word "stayed" to "stai". That is the
correct result for the porter stemmer, but it is the incorrect result for
the porter2 stemmer. I verified this using the python stemmer library.
According to that library, porter stems "stayed" to "stai" and porter2
stems "stayed" to "stay".

So I took a look into the code and I found the following
in StemmerTokenFilterFactory.java:
...
} else if ("porter".equalsIgnoreCase(language)) {
return new PorterStemFilter(tokenStream);
} else if ("porter2".equalsIgnoreCase(language)) {
return new SnowballFilter(tokenStream, new PorterStemmer());
...

Notice that in both cases a Porter stemmer is instantiated, not a porter2
stemmer. Any thoughts on why this is not a bug?

--

I agree it looks like a bug. I created an issue for it
https://github.com/elasticsearch/elasticsearch/issues/2451 Thanks for
report.

As a workaround, you can use "english" instead of "porter2" as a filter
language.

On Wednesday, November 28, 2012 1:24:27 AM UTC-5, Michael Sander wrote:

The documentation for the "Stemmer" filter indicates that porter2 is an
available option:

http://www.elasticsearch.org/guide/reference/index-modules/analysis/stemmer-tokenfilter.html

However, I think there may be a bug here because I think "porter2" may
just map to the porter stemmer. I tried stemming a word both on the porter
and porter2 stemmers. Both stemmed the word "stayed" to "stai". That is the
correct result for the porter stemmer, but it is the incorrect result for
the porter2 stemmer. I verified this using the python stemmer library.
According to that library, porter stems "stayed" to "stai" and porter2
stems "stayed" to "stay".

So I took a look into the code and I found the following
in StemmerTokenFilterFactory.java:
...
} else if ("porter".equalsIgnoreCase(language)) {
return new PorterStemFilter(tokenStream);
} else if ("porter2".equalsIgnoreCase(language)) {
return new SnowballFilter(tokenStream, new PorterStemmer());
...

Notice that in both cases a Porter stemmer is instantiated, not a porter2
stemmer. Any thoughts on why this is not a bug?

--