thanks for pointing this out. Yes, it's true, the use of hunspell for
stemming must be carefully evaluated for each dictionary. See also the
Robert Muir gave caution about this in https://issues.apache.org/jira/browse/SOLR-2769
I assume the czeck dictionary I found in Chromium is not the best
To be honest, I am just in the process of learning to write
Elasticsearch plugins, and I started with a very tiny project. Most
attractive was a feature that appeared in Lucene 3.5, the hunspell
In a more advanced dictionary plugin I am busy with, I will use
hunspell dictionaries in the more appropriate way, that is, for spell
On Jan 26, 3:47 pm, Lukáš Vlček lukas.vl...@gmail.com wrote:
Just to give an illustration, there is a czech word "rada" which in given
context means "board" (but it can also mean "advice").
Hunspell with cs_CZ locale yields the following terms:
rada (the same term but I guess it is meant that this time it means advice)
raď (give advice - a verb)
radon (radon - a noun)
This really can not qualify as a stemmer.
On Thu, Jan 26, 2012 at 3:39 PM, Lukáš Vlček lukas.vl...@gmail.com wrote:
I gave a hunspell plugin a try and have some doubts whether it can really
qualify as a stemmer. The problem I see with it is that it can emit way too
many different options for some terms (especially short one) that this can
IMO seriously harm the relevancy. I was testing it for the Czech language
but I guess the same situation is for some other languages as well (based
on my short test English seems to work a lot better).
I can clearly see benefit of hunspell as a spelling tool but stemmer? I am
not familiar with hunspell API but are there any options that can influence
the stemming process that might be useful to expose tinES plugin API as
On Tue, Jan 3, 2012 at 9:38 AM, jprante joergpra...@gmail.com wrote:
Thank you for pointing this out. I uploaded a zip file elasticsearch-
analysis-hunspell-1.0.0.zip to the github download area.
On Jan 2, 1:49 pm, Damien Hardy damienhardy....@gmail.com wrote:
On 29 déc 2011, 21:45, Jörg Prante joergpra...@gmail.com wrote:
because all of you are eager to keep up with Lucene 3.5 features, I
wrote an ElasticSearch Hunspell Analysis plugin.
For discussion, seehttps://
Please note: included are hunspell dict/aff files from Chromium for
convenience. The license for the third-party files is a tri-license
But installation proccess is not working..
We miss the compiled jar available for downloading form github to
install it on elasticsearch via the plugin utillity.