"apple" and the english analyzer

I'm not sure why ES is behaving like this. Here's my request:

GET /_analyze
{
	"analyzer": "english",
	"text": "apple"
}

And here's the response:

{
  "tokens": [
    {
      "token": "appl",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

Does anyone know why apple is turning into appl?

BTW, I tried the same thing with pineapple and it turns into pineappl. Words like pear, banana, and encyclopedia come out just fine.

That's the effect of the english stemmer IMO.

Have a look at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html

There are other options for the stemmer. If the default one does not fit your needs.

For background - stemming is used so that words like meddle, meddled, meddling and meddles are all turned into meddl in the index and can all match each other.

Not always useful though. I remember one situation where users were searching for Improvised Explosive Devices and the stemmer had turned all IEDs into just I.

Interesting. I didn't realize that stemming would just lop off the part of the word that varies. I had expected it to normalize on the root word. The meddle example makes sense. I'm trying to figure out what other words apple shares a root with, other than apples. I feel like stemming would make apple and apply share the same stem of appl.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.