Stemmer filter ignored in Analyze API

frankkoornstra · June 27, 2018, 8:47pm

I'm trying to see what stemming does specifically for different types of stemmers but when I do the following:

curl -X POST \
  http://i:9200/_analyze \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json' \
  -d '{
	"text": "consigned",
	"filter": [
		"lowercase",
		{
			"type": "stemmer",
			"language": "porter2"
		}
		]
}'

The results are

    "tokens": [
        {
            "token": "consigned",
            "start_offset": 0,
            "end_offset": 9,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

Whereas in the documentation of the Porter2 stemmer, it states that it should be stemmed to consign. What am I doing wrong?

The lowercase is working btw (if I use uppercase in the text)

johtani · June 28, 2018, 10:59am

Hi @frankkoornstra,

I think you use version 5.x.
In 5.x doc,

tokenizer is mandatory when using char_filter or filter. If it is not set, the filters are ignored and the default analyzer is used.

It means you should add tokenizer parameter. If you are thinking to test normalizer, you should add tokenizer="keyword".

The _analyze API can show what analyzer/tokenizer works in response if the request has explain=true parameter.

frankkoornstra · June 28, 2018, 11:02am

Jep, that's right! Thanks, completely overlooked the need to add a tokenizer, explains a lot.

system · July 26, 2018, 11:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Analyzer:Stemmer giving different results Elasticsearch	1	377	February 6, 2019
New language - Custom analyzer plugin or token filter Elasticsearch	1	542	March 21, 2017
Stemmer with query_string : what's the point with my config? Elasticsearch	1	325	July 6, 2017
Stemmer filter not working as expected when search, wont find some exact words Elasticsearch	2	780	May 4, 2021
Porter2 Stemmer is just the Porter Stemmer? Elasticsearch	2	771	July 6, 2017

Stemmer filter ignored in Analyze API

Related topics