There was another thread where we talked a bit about having a specific phone
number field, that would know about different phone number formats and index
it in a form that would help make it more searchable (possibly as numeric
field..., really depends on the type of search needed to be executed). It
gets complicated when it comes to internalization and the like, and might
not fit all cases, but we can try and start with something and see how it
goes...
We need analyzer, which able to ignore all not number chars and treat rest of chars as a separated chars, isn’t it?
So than query of different parts of number as a "phrase" will match.
9 серп. 2011, в 12:08, Shay Banon написал(а):
There was another thread where we talked a bit about having a specific phone number field, that would know about different phone number formats and index it in a form that would help make it more searchable (possibly as numeric field..., really depends on the type of search needed to be executed). It gets complicated when it comes to internalization and the like, and might not fit all cases, but we can try and start with something and see how it goes...
On Mon, Aug 8, 2011 at 11:13 PM, Sindre Sorhus sindresorhus@gmail.com wrote:
What is the best way to map, analyze and search a field with a phone number?
I have phone numbers in various formats.
+47 23546798
+47 23 54 67 98
+47 235 46 798
+4723546798
23546798
I need to be able to search for "23546798" or "2354" and find all the phone numbers above. What kind of analyzers should I use?
Any thought's about ES having a built-in phone number field?
We need analyzer, which able to ignore all not number chars and treat rest
of chars as a separated chars, isn’t it?
So than query of different parts of number as a "phrase" will match.
9 серп. 2011, в 12:08, Shay Banon написал(а):
There was another thread where we talked a bit about having a specific
phone number field, that would know about different phone number formats and
index it in a form that would help make it more searchable (possibly as
numeric field..., really depends on the type of search needed to be
executed). It gets complicated when it comes to internalization and the
like, and might not fit all cases, but we can try and start with something
and see how it goes...
I haven't followed this thread closely, but it looks like there are 2
separate things here:
phone number detection/extraction
phone number tokenization and search
Can't 1. be handled with something like GATE or any other NER tool?
After 1. is done, isn't searching arbitrary phone number substrings
just a matter of n-gramming?
Perhaps you could use/integrate http://code.google.com/p/libphonenumber/,
Google's library for phone number parsing in a variety of locales/formats.
It's Apache-licensed as well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.