Using PatternTokenizer

ppearcy · July 25, 2010, 9:28am

Hello,
Is it correct that in order to use the PatternTokenizer, one would
need to implement a plugin similar to icu?

Thanks,
Paul

kimchy · July 25, 2010, 6:50pm

Yes, but it can be part of the built in analyzers in elasticsearch (I assume
you refer to the one in Lucene).

-shay.banon

On Sun, Jul 25, 2010 at 12:28 PM, Paul ppearcy@gmail.com wrote:

Hello,
Is it correct that in order to use the PatternTokenizer, one would
need to implement a plugin similar to icu?

Thanks,
Paul

kimchy · July 25, 2010, 7:42pm

Add this: Analysis: Add pattern analyzer · Issue #276 · elastic/elasticsearch · GitHub.

On Sun, Jul 25, 2010 at 9:50 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Yes, but it can be part of the built in analyzers in elasticsearch (I
assume you refer to the one in Lucene).

-shay.banon

On Sun, Jul 25, 2010 at 12:28 PM, Paul ppearcy@gmail.com wrote:

Hello,
Is it correct that in order to use the PatternTokenizer, one would
need to implement a plugin similar to icu?

Thanks,
Paul

ppearcy · July 25, 2010, 11:16pm

Yeah, it probably makes sense to have it built in. I'd be happy to
create a fork and submit it. Would plan on exposing the pattern,
lowercase, and stopwords options that map directly to Lucene's
PatternAnalyzer inputs.

A separate pattern tokenizer would be nice to combine with other
options, but that doesn't appear to exist in Lucene (though Solr has a
more flexible version based on regex grouping that will probably be
available with the Lucene/Solr merge). Not that it would be hard to
write, just don't need it for my use case.

Thanks,
Paul

On Jul 25, 12:50 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, but it can be part of the built in analyzers in elasticsearch (I assume
you refer to the one in Lucene).

-shay.banon

On Sun, Jul 25, 2010 at 12:28 PM, Paul ppea...@gmail.com wrote:

Hello,
Is it correct that in order to use the PatternTokenizer, one would
need to implement a plugin similar to icu?

Thanks,
Paul

ppearcy · July 26, 2010, 1:55am

Huh, somehow the Nabble (which shows your response referencing
Analysis: Add pattern analyzer · Issue #276 · elastic/elasticsearch · GitHub) and
google groups which doesn't are out of sync.

Anyway, thanks a ton! Seems straight forward and I'll let you know if
there are any issues.

Best Regards,
Paul

On Jul 25, 5:16 pm, Paul ppea...@gmail.com wrote:

Yeah, it probably makes sense to have it built in. I'd be happy to
create a fork and submit it. Would plan on exposing the pattern,
lowercase, and stopwords options that map directly to Lucene's
PatternAnalyzer inputs.

A separate pattern tokenizer would be nice to combine with other
options, but that doesn't appear to exist in Lucene (though Solr has a
more flexible version based on regex grouping that will probably be
available with the Lucene/Solr merge). Not that it would be hard to
write, just don't need it for my use case.

Thanks,
Paul

On Jul 25, 12:50 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, but it can be part of the built in analyzers in elasticsearch (I assume
you refer to the one in Lucene).

-shay.banon

On Sun, Jul 25, 2010 at 12:28 PM, Paul ppea...@gmail.com wrote:

Hello,
Is it correct that in order to use the PatternTokenizer, one would
need to implement a plugin similar to icu?

Thanks,
Paul

Topic		Replies	Views
PatternTokenizer? Elasticsearch	2	237	July 6, 2017
Pattern tokenization Elasticsearch	12	367	July 6, 2017
Elasticsearch and plugins Elasticsearch	2	363	July 6, 2017
Utilizing other lucene analyzers (eg stanford lemmatizer) Elasticsearch	2	516	July 6, 2017
Morphology plugin Elasticsearch	2	485	July 6, 2017

Using PatternTokenizer

Related topics