I have encountered the following ElasticSearch "puzzle":
I would like my analyzer to use ASCII Folding token filter (to make
search work properly in Polish).
I need fine-grained control for the way tokens are split. Therefore
I use PatternAnalyzer.
I cannot combine these two, because (correct me if I am wrong) the
only analyzer that allows customization of filters is CustomAnalyzer
and I cannot add ASCI Folding filer to PatternAnalyzer.
I guess that having PatternTokenizer (not analyzer) would solve my
problem. Actually, there is (private) class
PatternAnalyzer.PatternTokenizer in Lucene.
Would it make sense to add PatternTokenizer to ElasticSearch? Or is
there any other way to solve issue?
I guess that having PatternTokenizer (not analyzer) would solve my
problem. Actually, there is (private) class
PatternAnalyzer.PatternTokenizer in Lucene.
Would it make sense to add PatternTokenizer to Elasticsearch? Or is
there any other way to solve issue?
Hi, in lucene's trunk there are some cleaner pattern-based components
that replaced this PatternAnalyzer: a PatternTokenizer,
PatternTokenFilter, and PatternCharFilter. These used to be in Solr
but in Lucene's trunk all analysis components are merged into this
single module.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.