Get stem for word in elastic


#1

Hi!

I'd like to pass a sentence to Elastic and want the word's root/etymon/stem (don't know how to call).
I'm using the latest elastic and if it's possible I'd do it through the REST api.

Thanks!


(Michael McCandless) #2

You could use e.g. Porter Stem filter (if the text is english), or Snowball if it's english or many other languages, and then use the analyze API (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html ) to see how a given chunk of text is translated to tokens...


#3

Problem is that the text is hungarian. But I read here: https://www.elastic.co/guide/en/elasticsearch/guide/current/language-intro.html
that there is a 'built-in' language analyzer, and I want to test that if it's fit for my requirements or not.

EDIT: and the other thing is that I'm not interested in the tokens, but in what is the result of stemming for the words...


(Michael McCandless) #4

The stemmer runs after tokenization for these language specific analyzers, I think (not sure if it does for Hungarian though). Just try using the Hungarian analyzer and see how it tokenizes?


#5

Okay, finally I managed to configure a hunspell analyzer with a hungarian dictionary and it works.


(system) #6