English stemmer bug

svarup · July 8, 2018, 5:32am

when i analyze "movies" word using english stemmer then it will give me movi. Other word like questions(question) or states(state) work correctly. below is the api that i use.

{
 "tokenizer": "standard",
 "text": "movies",
"filter": [
 "lowecase",
 {
  "type": "stemmer",
  "name": "english",
 }
]
}

if i use minimal_english stemmer then it will return "movy".

i am using ES-6.3.0

dadoonet · July 8, 2018, 7:33am

This is not a bug IMO.

Try lazy and you will get I think lazi.

Why do you think it's a problem ?

svarup · July 8, 2018, 8:37am

I think it should return movie instead of movi or movy. Like states return state and questions return question.

I don't understand "Try lazy and you will get I think lazi."

dadoonet · July 8, 2018, 11:55am

Why do you think so?
I mean what is the problem you want to solve?

warkolm · July 8, 2018, 9:17pm

This may be a Lucene/Elasticsearch thing then, cause stemming lazy to lazi doesn't make a lot of sense to me (as an english speaker). Same with movi.

dadoonet · July 8, 2018, 9:32pm

But what's the problem?

As soon as lazy, laziness and alll other forms are translated to the same root, that should be ok, no?

warkolm · July 9, 2018, 2:15am

True, it's just a weird representation.

svarup · July 9, 2018, 2:46am

The main problem is when i index string let say "jumanji movies on google play movies". Using english stemmer and when i search this english stemmer filed with "movie" it won't match but if i search with "movi" it will match if field use english stemmer and if i search with "movy" it will match if field use minimal_english stemmer but none of these match "movie" because stemmer tokenized "movies" word or token in "jumanji movies on google play movies" string into "movi" or "movy".

system · August 6, 2018, 2:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.