PatternTokenizer?

Pawel_Wrzeszcz · February 18, 2011, 10:47am

Hello,

I have encountered the following ElasticSearch "puzzle":

I would like my analyzer to use ASCII Folding token filter (to make
search work properly in Polish).
I need fine-grained control for the way tokens are split. Therefore
I use PatternAnalyzer.

I cannot combine these two, because (correct me if I am wrong) the
only analyzer that allows customization of filters is CustomAnalyzer
and I cannot add ASCI Folding filer to PatternAnalyzer.

I guess that having PatternTokenizer (not analyzer) would solve my
problem. Actually, there is (private) class
PatternAnalyzer.PatternTokenizer in Lucene.

Would it make sense to add PatternTokenizer to ElasticSearch? Or is
there any other way to solve issue?

Regards,
-Pawel Wrzeszcz

rmuir · February 18, 2011, 9:13pm

On Fri, Feb 18, 2011 at 5:47 AM, Pawel Wrzeszcz
pawel.wrzeszcz@gmail.com wrote:

I guess that having PatternTokenizer (not analyzer) would solve my
problem. Actually, there is (private) class
PatternAnalyzer.PatternTokenizer in Lucene.

Would it make sense to add PatternTokenizer to Elasticsearch? Or is
there any other way to solve issue?

Hi, in lucene's trunk there are some cleaner pattern-based components
that replaced this PatternAnalyzer: a PatternTokenizer,
PatternTokenFilter, and PatternCharFilter. These used to be in Solr
but in Lucene's trunk all analysis components are merged into this
single module.

http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/pattern/

(Note, for performance reasons some of these use new features of
upcoming Lucene 3.1's analysis API, but maybe would still be easier to
start from)

Topic		Replies	Views
Using PatternTokenizer Elasticsearch	5	293	July 6, 2017
Adding filter to existing analyzer Elasticsearch	4	939	July 6, 2017
Mapping with explicit analyzer Elasticsearch	3	323	July 6, 2017
Custom Tokenization Elasticsearch	2	252	July 6, 2017
Can I add additional filters to non custom type analyzer? Elasticsearch	4	461	July 6, 2017

PatternTokenizer?

Related topics