"apple" and the english analyzer

emarthinsen · February 25, 2020, 2:31am

I'm not sure why ES is behaving like this. Here's my request:

GET /_analyze
{
	"analyzer": "english",
	"text": "apple"
}

And here's the response:

{
  "tokens": [
    {
      "token": "appl",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

Does anyone know why apple is turning into appl?

BTW, I tried the same thing with pineapple and it turns into pineappl. Words like pear, banana, and encyclopedia come out just fine.

dadoonet · February 25, 2020, 8:26am

That's the effect of the english stemmer IMO.

Have a look at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html

There are other options for the stemmer. If the default one does not fit your needs.

Mark_Harwood · February 25, 2020, 9:40am

For background - stemming is used so that words like meddle, meddled, meddling and meddles are all turned into meddl in the index and can all match each other.

Not always useful though. I remember one situation where users were searching for Improvised Explosive Devices and the stemmer had turned all IEDs into just I.

emarthinsen · February 26, 2020, 9:05pm

Interesting. I didn't realize that stemming would just lop off the part of the word that varies. I had expected it to normalize on the root word. The meddle example makes sense. I'm trying to figure out what other words apple shares a root with, other than apples. I feel like stemming would make apple and apply share the same stem of appl.

system · March 25, 2020, 9:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch Analyzer:Stemmer giving different results Elasticsearch	1	375	February 6, 2019
Basic stemming problem - what am I missing? Elasticsearch	3	1465	July 5, 2017
Stemming not working as expected Elasticsearch	1	402	December 25, 2018
Problem with english analyzer Elasticsearch	4	287	July 6, 2017
Stemming Problem Elasticsearch	7	347	July 6, 2017

"apple" and the english analyzer

Related topics