Porter2 Stemmer is just the Porter Stemmer?

Michael_Sander · November 28, 2012, 6:24am

The documentation for the "Stemmer" filter indicates that porter2 is an
available option:
http://www.elasticsearch.org/guide/reference/index-modules/analysis/stemmer-tokenfilter.html

However, I think there may be a bug here because I think "porter2" may just
map to the porter stemmer. I tried stemming a word both on the porter and
porter2 stemmers. Both stemmed the word "stayed" to "stai". That is the
correct result for the porter stemmer, but it is the incorrect result for
the porter2 stemmer. I verified this using the python stemmer library.
According to that library, porter stems "stayed" to "stai" and porter2
stems "stayed" to "stay".

So I took a look into the code and I found the following
in StemmerTokenFilterFactory.java:
...
} else if ("porter".equalsIgnoreCase(language)) {
return new PorterStemFilter(tokenStream);
} else if ("porter2".equalsIgnoreCase(language)) {
return new SnowballFilter(tokenStream, new PorterStemmer());
...

Notice that in both cases a Porter stemmer is instantiated, not a porter2
stemmer. Any thoughts on why this is not a bug?

--

Igor_Motov · November 29, 2012, 1:18am

I agree it looks like a bug. I created an issue for it
The Porter2 Stemmer Token Filter is just Porter Stemmer · Issue #2451 · elastic/elasticsearch · GitHub Thanks for
report.

As a workaround, you can use "english" instead of "porter2" as a filter
language.

On Wednesday, November 28, 2012 1:24:27 AM UTC-5, Michael Sander wrote:

The documentation for the "Stemmer" filter indicates that porter2 is an
available option:

Elasticsearch Platform — Find real-time answers at scale | Elastic

However, I think there may be a bug here because I think "porter2" may
just map to the porter stemmer. I tried stemming a word both on the porter
and porter2 stemmers. Both stemmed the word "stayed" to "stai". That is the
correct result for the porter stemmer, but it is the incorrect result for
the porter2 stemmer. I verified this using the python stemmer library.
According to that library, porter stems "stayed" to "stai" and porter2
stems "stayed" to "stay".

So I took a look into the code and I found the following
in StemmerTokenFilterFactory.java:
...
} else if ("porter".equalsIgnoreCase(language)) {
return new PorterStemFilter(tokenStream);
} else if ("porter2".equalsIgnoreCase(language)) {
return new SnowballFilter(tokenStream, new PorterStemmer());
...

Notice that in both cases a Porter stemmer is instantiated, not a porter2
stemmer. Any thoughts on why this is not a bug?

--

Topic		Replies	Views
Opinions on KStem vs Porter Stem? Elasticsearch	7	2402	July 5, 2017
Problem with switching stemmers Elasticsearch	6	438	July 6, 2017
New language - Custom analyzer plugin or token filter Elasticsearch	1	540	March 21, 2017
Stemmer filter ignored in Analyze API Elasticsearch	3	385	July 26, 2018
Using the Snowball stemmers Elasticsearch	2	284	July 6, 2017

Porter2 Stemmer is just the Porter Stemmer?

Related topics