Stemmer filter ignored in Analyze API


(Frank) #1

I'm trying to see what stemming does specifically for different types of stemmers but when I do the following:

curl -X POST \
  http://i:9200/_analyze \
  -H 'Cache-Control: no-cache' \
  -H 'Content-Type: application/json' \
  -d '{
	"text": "consigned",
	"filter": [
		"lowercase",
		{
			"type": "stemmer",
			"language": "porter2"
		}
		]
}'

The results are

    "tokens": [
        {
            "token": "consigned",
            "start_offset": 0,
            "end_offset": 9,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

Whereas in the documentation of the Porter2 stemmer, it states that it should be stemmed to consign. What am I doing wrong?

The lowercase is working btw (if I use uppercase in the text)


(Jun Ohtani) #2

Hi @frankkoornstra,

I think you use version 5.x.
In 5.x doc,

tokenizer is mandatory when using char_filter or filter. If it is not set, the filters are ignored and the default analyzer is used.

It means you should add tokenizer parameter. If you are thinking to test normalizer, you should add tokenizer="keyword".

The _analyze API can show what analyzer/tokenizer works in response if the request has explain=true parameter.


(Frank) #3

Jep, that's right! Thanks, completely overlooked the need to add a tokenizer, explains a lot.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.