How do language analyzers work?

Hi all,

I'm newbie in Elastic Search. I'm interesting in the language analyzers and
I would like to know how they work.

I tried analyze with the language analyzer for English and this is the
result:

curl -XGET ' localhost:9200/_analyze?analyzer=english' -d 'Testing language
analyzer of Elastic Search: English'

And the tokens I received from ES are:

  • test
  • languag
  • analyz
  • elast
  • search
  • english

Are those tokens the expected result? I think the tokens should be the
whole word, but there are some incomplete word (languag, analyz, elast)

Regards,
Sam

Language analyzers extract radical from words.
If you send languages, language, you will have the same result.

Analyzing, analyzers, analyze... will be considered as equals when analyzed.

HTH
David

--

Le 3 août 2012 à 07:01, Ngoc Vo ngoc.vo3103@gmail.com a écrit :

Hi all,

I'm newbie in Elastic Search. I'm interesting in the language analyzers and I would like to know how they work.

I tried analyze with the language analyzer for English and this is the result:

curl -XGET ' localhost:9200/_analyze?analyzer=english' -d 'Testing language analyzer of Elastic Search: English'

And the tokens I received from ES are:

  • test
  • languag
  • analyz
  • elast
  • search
  • english

Are those tokens the expected result? I think the tokens should be the whole word, but there are some incomplete word (languag, analyz, elast)

Regards,
Sam

Wikipedia has some general info on stemming that you may find somewhat helpful.

--
Shaun

On Friday, 3 August 2012 at 17:02, David Pilato wrote:

Language analyzers extract radical from words.
If you send languages, language, you will have the same result.

Analyzing, analyzers, analyze... will be considered as equals when analyzed.

HTH
David

--

Le 3 août 2012 à 07:01, Ngoc Vo ngoc.vo3103@gmail.com a écrit :

Hi all,

I'm newbie in Elastic Search. I'm interesting in the language analyzers and I would like to know how they work.

I tried analyze with the language analyzer for English and this is the result:

curl -XGET ' localhost:9200/_analyze?analyzer=english' -d 'Testing language analyzer of Elastic Search: English'

And the tokens I received from ES are:

  • test
  • languag
  • analyz
  • elast
  • search
  • english

Are those tokens the expected result? I think the tokens should be the whole word, but there are some incomplete word (languag, analyz, elast)

Regards,
Sam