How do language analyzers work?


(ngocvo3103) #1

Hi all,

I'm newbie in Elastic Search. I'm interesting in the language analyzers and
I would like to know how they work.

I tried analyze with the language analyzer for English and this is the
result:

curl -XGET ' localhost:9200/_analyze?analyzer=english' -d 'Testing language
analyzer of Elastic Search: English'

And the tokens I received from ES are:

  • test
  • languag
  • analyz
  • elast
  • search
  • english

Are those tokens the expected result? I think the tokens should be the
whole word, but there are some incomplete word (languag, analyz, elast)

Regards,
Sam


(David Pilato) #2

Language analyzers extract radical from words.
If you send languages, language, you will have the same result.

Analyzing, analyzers, analyze... will be considered as equals when analyzed.

HTH
David

--

Le 3 août 2012 à 07:01, Ngoc Vo ngoc.vo3103@gmail.com a écrit :

Hi all,

I'm newbie in Elastic Search. I'm interesting in the language analyzers and I would like to know how they work.

I tried analyze with the language analyzer for English and this is the result:

curl -XGET ' localhost:9200/_analyze?analyzer=english' -d 'Testing language analyzer of Elastic Search: English'

And the tokens I received from ES are:

  • test
  • languag
  • analyz
  • elast
  • search
  • english

Are those tokens the expected result? I think the tokens should be the whole word, but there are some incomplete word (languag, analyz, elast)

Regards,
Sam


(Shaun Etherton) #3

Wikipedia has some general info on stemming that you may find somewhat helpful.

--
Shaun

On Friday, 3 August 2012 at 17:02, David Pilato wrote:

Language analyzers extract radical from words.
If you send languages, language, you will have the same result.

Analyzing, analyzers, analyze... will be considered as equals when analyzed.

HTH
David

--

Le 3 août 2012 à 07:01, Ngoc Vo ngoc.vo3103@gmail.com a écrit :

Hi all,

I'm newbie in Elastic Search. I'm interesting in the language analyzers and I would like to know how they work.

I tried analyze with the language analyzer for English and this is the result:

curl -XGET ' localhost:9200/_analyze?analyzer=english' -d 'Testing language analyzer of Elastic Search: English'

And the tokens I received from ES are:

  • test
  • languag
  • analyz
  • elast
  • search
  • english

Are those tokens the expected result? I think the tokens should be the whole word, but there are some incomplete word (languag, analyz, elast)

Regards,
Sam


(system) #4