Get stem for word in elastic



I'd like to pass a sentence to Elastic and want the word's root/etymon/stem (don't know how to call).
I'm using the latest elastic and if it's possible I'd do it through the REST api.


(Michael McCandless) #2

You could use e.g. Porter Stem filter (if the text is english), or Snowball if it's english or many other languages, and then use the analyze API ( ) to see how a given chunk of text is translated to tokens...


Problem is that the text is hungarian. But I read here:
that there is a 'built-in' language analyzer, and I want to test that if it's fit for my requirements or not.

EDIT: and the other thing is that I'm not interested in the tokens, but in what is the result of stemming for the words...

(Michael McCandless) #4

The stemmer runs after tokenization for these language specific analyzers, I think (not sure if it does for Hungarian though). Just try using the Hungarian analyzer and see how it tokenizes?


Okay, finally I managed to configure a hunspell analyzer with a hungarian dictionary and it works.

(system) #6