Custom tokenizer for phone numbers


(asanderson) #1

Greetings!

I'm trying to figure out how to define a custom tokenizer for phone numbers
similar to how we tokenize phone number in Solr using the
solr.WordDelimiterFilterFactoryhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory with
all the parameters set to true.

So, the following tokens would be generated for (111) 222-3333: (111)
222-3333, 1112223333, 111, 222, 3333.

How would I define a similar custom tokenizer in ElasticSearch?

Thanks in advance.

p.s. Since ElasticSearch supports email and url tokenization out of the
box, how about adding a standard telephone number tokenizer? Thoughts?

Steve

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Igor Motov) #2

You can still use
http://www.elasticsearch.org/guide/reference/index-modules/analysis/word-delimiter-tokenfilter.html

On Monday, March 25, 2013 6:34:35 PM UTC-4, asanderson wrote:

Greetings!

I'm trying to figure out how to define a custom tokenizer for phone
numbers similar to how we tokenize phone number in Solr using the
solr.WordDelimiterFilterFactoryhttp://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory with
all the parameters set to true.

So, the following tokens would be generated for (111) 222-3333: (111)
222-3333, 1112223333, 111, 222, 3333.

How would I define a similar custom tokenizer in ElasticSearch?

Thanks in advance.

p.s. Since ElasticSearch supports email and url tokenization out of the
box, how about adding a standard telephone number tokenizer? Thoughts?

Steve

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(asanderson) #3

You can still use
http://www.elasticsearch.org/guide/reference/index-modules/analysis/word-delimiter-tokenfilter.html

Igor,

I'm not sure how I missed that, but thanks for pointing me to it. :wink:

Steve

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #8