Improved stemming for Arabic


(angella) #1

Hello everyone,
I'm trying to improve stemming for Arabic language.
We already use elasticsearch stemmer for Arabic, but it is still not good
enough to make all our stemmer tests pass.
I spent few days wondering the internet but wasn't able to found something
else.
Only hunspell. But now I can't find working dictionary.
The only one that actually looks like matching (this onehttp://ayaspell.sourceforge.net/index.php?content=english)
doesn't work, throwing different parsing exceptions that ends up in String
index out of range: 11
which I actually don't have an idea how to fix.

Does anyone know something about good Arabic stemmer, or where can I find a
dictionary? Would really appreciate any help.

Thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fbb23791-cf59-4255-bd0f-7ffa487c2b28%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Arabic stemmer and synonymous
(Itamar Syn-Hershko) #2

There's this: http://www.nongnu.org/aramorph/ (which I believe was
contributed to Lucene a while ago). Elasticsearch uses the snowball Arabic
stemmer, which is different than Lucene's provided Arabic stemmer.

But there's a paper showing that the light-10 algorithm/stemmer and even
4-5 grams perform much better than the above two, you should try those.
They do not require a dictionary.

I can't seem to find that paper now, the link I had to it seems to be
broken.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Author of RavenDB in Action http://manning.com/synhershko/

On Thu, Apr 24, 2014 at 2:11 PM, Angel Cross niegis19@gmail.com wrote:

Hello everyone,
I'm trying to improve stemming for Arabic language.
We already use elasticsearch stemmer for Arabic, but it is still not good
enough to make all our stemmer tests pass.
I spent few days wondering the internet but wasn't able to found something
else.
Only hunspell. But now I can't find working dictionary.
The only one that actually looks like matching (this onehttp://ayaspell.sourceforge.net/index.php?content=english)
doesn't work, throwing different parsing exceptions that ends up in String
index out of range: 11
which I actually don't have an idea how to fix.

Does anyone know something about good Arabic stemmer, or where can I find
a dictionary? Would really appreciate any help.

Thanks.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fbb23791-cf59-4255-bd0f-7ffa487c2b28%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/fbb23791-cf59-4255-bd0f-7ffa487c2b28%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsBGLjG7YCNhMKxRJAQCdX5Twk%3D6mLJgUo5iVQ0H1ZpZQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3