English stemmer bug


(Svarup) #1

when i analyze "movies" word using english stemmer then it will give me movi. Other word like questions(question) or states(state) work correctly. below is the api that i use.

{
 "tokenizer": "standard",
 "text": "movies",
"filter": [
 "lowecase",
 {
  "type": "stemmer",
  "name": "english",
 }
]
}

if i use minimal_english stemmer then it will return "movy".

i am using ES-6.3.0


(David Pilato) #2

This is not a bug IMO.

Try lazy and you will get I think lazi.

Why do you think it's a problem ?


(Svarup) #3

I think it should return movie instead of movi or movy. Like states return state and questions return question.

I don't understand "Try lazy and you will get I think lazi."


(David Pilato) #4

Why do you think so?
I mean what is the problem you want to solve?


(Mark Walkom) #5

This may be a Lucene/Elasticsearch thing then, cause stemming lazy to lazi doesn't make a lot of sense to me (as an english speaker). Same with movi.


(David Pilato) #6

But what's the problem?

As soon as lazy, laziness and alll other forms are translated to the same root, that should be ok, no?


(Mark Walkom) #7

True, it's just a weird representation.


(Svarup) #8

The main problem is when i index string let say "jumanji movies on google play movies". Using english stemmer and when i search this english stemmer filed with "movie" it won't match but if i search with "movi" it will match if field use english stemmer and if i search with "movy" it will match if field use minimal_english stemmer but none of these match "movie" because stemmer tokenized "movies" word or token in "jumanji movies on google play movies" string into "movi" or "movy".


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.