have you tried the following plugin?
haven't used it myself, but looks promising.
I'm no expert with stemming, but in my understanding I'd say: as for
combining multiple stemmers in one analyzer wouldn't be a good idea, as
they would, as you assumed, interfere with each other. You can check if the
output of the analyzer would be good at all if you just define one and use
it via the REST interface.
if you mean the language analyzer for arabic, this one is, as stated in the
documentation, already optimized for arabic language, thus you can't define
the tokenizer. if you create your own custom analyzer with arabic stemmer
then you can provide your own tokenizer, but you have to know which one to
use to get the best results.
hth
david
On Thursday, June 13, 2013 8:40:13 AM UTC+2, tarang dawer wrote:
Hi
If i configure a custom analyzer with filters :- arabic stopword, english
stopword, arabic stemmer , and english stemmer , will the respective
language filters not interfere with the tokens of other language ?
Also, Will standard tokenizer work fine in this case ? (arabic analyzer
des not have tokenizer type in the documentation)
Could somebody please help me resolve the issue ?
Thanks
Tarang Dawer
On Wed, Jun 12, 2013 at 12:55 PM, Tarang Dawer <tarang...@gmail.com<javascript:>
wrote:
Hi
I am indexing some data which is a mixture of arabic and english content.
Since the content is huge in size , thus, to avoid storage overhead ,
multi-field option is not prefrable (1 field with arabic analyzer and 2nd
with snowball analyzer) .
So , does any analyzer (snowball, arabic , or any other ) has the
capability to stem both arabic and english ?
Please Help
Thanks
Tarang Dawer
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.