Hi
I want to use arabic tokenizer for some arabic content, but the
documentation for it is a bit scarce on the ES site, so , please , could
someone , explain , what tokenizer(standard or is there some arabic
tokenizer ? ) and filters does it includes ?
There is an existing ArabicAnalyzer in the existing language analyzers:
According to the Lucene source, it uses the StandardTokenizer,
a ArabicNormalizationFilter (beside other standard filters) and a custom
Arabic stop set and stem exclusions.
Hi
I want to use arabic tokenizer for some arabic content, but the
documentation for it is a bit scarce on the ES site, so , please , could
someone , explain , what tokenizer(standard or is there some arabic
tokenizer ? ) and filters does it includes ?
According to the Lucene source, it uses the StandardTokenizer,
a ArabicNormalizationFilter (beside other standard filters) and a custom
Arabic stop set and stem exclusions.
Hi
I want to use arabic tokenizer for some arabic content, but the
documentation for it is a bit scarce on the ES site, so , please , could
someone , explain , what tokenizer(standard or is there some arabic
tokenizer ? ) and filters does it includes ?
According to the Lucene source, it uses the StandardTokenizer,
a ArabicNormalizationFilter (beside other standard filters) and a custom
Arabic stop set and stem exclusions.
Hi
I want to use arabic tokenizer for some arabic content, but the
documentation for it is a bit scarce on the ES site, so , please , could
someone , explain , what tokenizer(standard or is there some arabic
tokenizer ? ) and filters does it includes ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.