Stemmer token filter result is different that it should be


(adrienbigler) #1

If I try to analyze the following text

GET _analyze?analyzer=french&text=Ils maintenaient la machine

It results 2 tokens: "maintenaient", "machin".

Elasticsearch apply more options to the default snowball stemming
algorithm. Without these options, the result for the first token should be
"mainten" (approved by the
documentation http://snowball.tartarus.org/algorithms/french/stemmer.html).

What are these additional options that elasticsearch add to the standard
snowball analyzer?

TIA

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0522456e-f7c2-4f6b-907c-d4ee9f53b6b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Robert Muir-2) #2

When you use the "french" analyzer it uses the Lucene FrenchAnalyzer
behind the scenes, which does not use the snowball algorithm.

It uses the Savoy stemmer, the same as specifying "light_french"
stemmer: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html

On Tue, May 20, 2014 at 8:03 AM, adrienbigler@gmail.com wrote:

If I try to analyze the following text

GET _analyze?analyzer=french&text=Ils maintenaient la machine

It results 2 tokens: "maintenaient", "machin".

Elasticsearch apply more options to the default snowball stemming algorithm.
Without these options, the result for the first token should be "mainten"
(approved by the documentation
http://snowball.tartarus.org/algorithms/french/stemmer.html).

What are these additional options that elasticsearch add to the standard
snowball analyzer?

TIA

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0522456e-f7c2-4f6b-907c-d4ee9f53b6b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAMUKNZWK5PyE%3DsrOr4SwMe91Ug%3D3BzVNOZkN5z8O-QY8DZHhRg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3