Using PatternTokenizer

Hello,
Is it correct that in order to use the PatternTokenizer, one would
need to implement a plugin similar to icu?

Thanks,
Paul

Yes, but it can be part of the built in analyzers in elasticsearch (I assume
you refer to the one in Lucene).

-shay.banon

On Sun, Jul 25, 2010 at 12:28 PM, Paul ppearcy@gmail.com wrote:

Hello,
Is it correct that in order to use the PatternTokenizer, one would
need to implement a plugin similar to icu?

Thanks,
Paul

Add this: Analysis: Add pattern analyzer · Issue #276 · elastic/elasticsearch · GitHub.

On Sun, Jul 25, 2010 at 9:50 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Yes, but it can be part of the built in analyzers in elasticsearch (I
assume you refer to the one in Lucene).

-shay.banon

On Sun, Jul 25, 2010 at 12:28 PM, Paul ppearcy@gmail.com wrote:

Hello,
Is it correct that in order to use the PatternTokenizer, one would
need to implement a plugin similar to icu?

Thanks,
Paul

Yeah, it probably makes sense to have it built in. I'd be happy to
create a fork and submit it. Would plan on exposing the pattern,
lowercase, and stopwords options that map directly to Lucene's
PatternAnalyzer inputs.

A separate pattern tokenizer would be nice to combine with other
options, but that doesn't appear to exist in Lucene (though Solr has a
more flexible version based on regex grouping that will probably be
available with the Lucene/Solr merge). Not that it would be hard to
write, just don't need it for my use case.

Thanks,
Paul

On Jul 25, 12:50 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, but it can be part of the built in analyzers in elasticsearch (I assume
you refer to the one in Lucene).

-shay.banon

On Sun, Jul 25, 2010 at 12:28 PM, Paul ppea...@gmail.com wrote:

Hello,
Is it correct that in order to use the PatternTokenizer, one would
need to implement a plugin similar to icu?

Thanks,
Paul

Huh, somehow the Nabble (which shows your response referencing
Analysis: Add pattern analyzer · Issue #276 · elastic/elasticsearch · GitHub) and
google groups which doesn't are out of sync.

Anyway, thanks a ton! Seems straight forward and I'll let you know if
there are any issues.

Best Regards,
Paul

On Jul 25, 5:16 pm, Paul ppea...@gmail.com wrote:

Yeah, it probably makes sense to have it built in. I'd be happy to
create a fork and submit it. Would plan on exposing the pattern,
lowercase, and stopwords options that map directly to Lucene's
PatternAnalyzer inputs.

A separate pattern tokenizer would be nice to combine with other
options, but that doesn't appear to exist in Lucene (though Solr has a
more flexible version based on regex grouping that will probably be
available with the Lucene/Solr merge). Not that it would be hard to
write, just don't need it for my use case.

Thanks,
Paul

On Jul 25, 12:50 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Yes, but it can be part of the built in analyzers in elasticsearch (I assume
you refer to the one in Lucene).

-shay.banon

On Sun, Jul 25, 2010 at 12:28 PM, Paul ppea...@gmail.com wrote:

Hello,
Is it correct that in order to use the PatternTokenizer, one would
need to implement a plugin similar to icu?

Thanks,
Paul