Hello everyone,
I'm trying to improve stemming for Arabic language.
We already use elasticsearch stemmer for Arabic, but it is still not good
enough to make all our stemmer tests pass.
I spent few days wondering the internet but wasn't able to found something
else.
Only hunspell. But now I can't find working dictionary.
The only one that actually looks like matching (this onehttp://ayaspell.sourceforge.net/index.php?content=english)
doesn't work, throwing different parsing exceptions that ends up in String
index out of range: 11 which I actually don't have an idea how to fix.
Does anyone know something about good Arabic stemmer, or where can I find a
dictionary? Would really appreciate any help.
There's this: AraMorph (which I believe was
contributed to Lucene a while ago). Elasticsearch uses the snowball Arabic
stemmer, which is different than Lucene's provided Arabic stemmer.
But there's a paper showing that the light-10 algorithm/stemmer and even
4-5 grams perform much better than the above two, you should try those.
They do not require a dictionary.
I can't seem to find that paper now, the link I had to it seems to be
broken.
Hello everyone,
I'm trying to improve stemming for Arabic language.
We already use elasticsearch stemmer for Arabic, but it is still not good
enough to make all our stemmer tests pass.
I spent few days wondering the internet but wasn't able to found something
else.
Only hunspell. But now I can't find working dictionary.
The only one that actually looks like matching (this onehttp://ayaspell.sourceforge.net/index.php?content=english)
doesn't work, throwing different parsing exceptions that ends up in String
index out of range: 11 which I actually don't have an idea how to fix.
Does anyone know something about good Arabic stemmer, or where can I find
a dictionary? Would really appreciate any help.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.