Map, analyze and search phone number


(Sindre Sorhus) #1

What is the best way to map, analyze and search a field with a phone number?

I have phone numbers in various formats.
+47 23546798
+47 23 54 67 98
+47 235 46 798
+4723546798
23546798

I need to be able to search for "23546798" or "2354" and find all the phone
numbers above. What kind of analyzers should I use?

Any thought's about ES having a built-in phone number field?


(Otis Gospodnetić) #2

Hi,

This is essentially a search for an arbitrary substring. You can use
n-grams for that.

Otis

Sematext is hiring Search Engineers -- http://sematext.com/about/jobs.html

On Aug 8, 4:13 pm, Sindre Sorhus sindresor...@gmail.com wrote:

What is the best way to map, analyze and search a field with a phone number?

I have phone numbers in various formats.
+47 23546798
+47 23 54 67 98
+47 235 46 798
+4723546798
23546798

I need to be able to search for "23546798" or "2354" and find all the phone
numbers above. What kind of analyzers should I use?

Any thought's about ES having a built-in phone number field?


(Shay Banon) #3

There was another thread where we talked a bit about having a specific phone
number field, that would know about different phone number formats and index
it in a form that would help make it more searchable (possibly as numeric
field..., really depends on the type of search needed to be executed). It
gets complicated when it comes to internalization and the like, and might
not fit all cases, but we can try and start with something and see how it
goes...

On Mon, Aug 8, 2011 at 11:13 PM, Sindre Sorhus sindresorhus@gmail.comwrote:

What is the best way to map, analyze and search a field with a phone
number?

I have phone numbers in various formats.
+47 23546798
+47 23 54 67 98
+47 235 46 798
+4723546798
23546798

I need to be able to search for "23546798" or "2354" and find all the phone
numbers above. What kind of analyzers should I use?

Any thought's about ES having a built-in phone number field?


(yark) #4

Is just a new analyzer solve the problem?

We need analyzer, which able to ignore all not number chars and treat rest of chars as a separated chars, isn’t it?
So than query of different parts of number as a "phrase" will match.

9 серп. 2011, в 12:08, Shay Banon написал(а):

There was another thread where we talked a bit about having a specific phone number field, that would know about different phone number formats and index it in a form that would help make it more searchable (possibly as numeric field..., really depends on the type of search needed to be executed). It gets complicated when it comes to internalization and the like, and might not fit all cases, but we can try and start with something and see how it goes...

On Mon, Aug 8, 2011 at 11:13 PM, Sindre Sorhus sindresorhus@gmail.com wrote:
What is the best way to map, analyze and search a field with a phone number?

I have phone numbers in various formats.
+47 23546798
+47 23 54 67 98
+47 235 46 798
+4723546798
23546798

I need to be able to search for "23546798" or "2354" and find all the phone numbers above. What kind of analyzers should I use?

Any thought's about ES having a built-in phone number field?


(Shay Banon) #5

It depends since format of phone number can very. Also, it would be nice to
have an option to "extract" phone numbers from text.

2011/8/9 Yaroslav th-net@ya.ru

Is just a new analyzer solve the problem?

We need analyzer, which able to ignore all not number chars and treat rest
of chars as a separated chars, isn’t it?
So than query of different parts of number as a "phrase" will match.

9 серп. 2011, в 12:08, Shay Banon написал(а):

There was another thread where we talked a bit about having a specific
phone number field, that would know about different phone number formats and
index it in a form that would help make it more searchable (possibly as
numeric field..., really depends on the type of search needed to be
executed). It gets complicated when it comes to internalization and the
like, and might not fit all cases, but we can try and start with something
and see how it goes...

On Mon, Aug 8, 2011 at 11:13 PM, Sindre Sorhus sindresorhus@gmail.comwrote:

What is the best way to map, analyze and search a field with a phone
number?

I have phone numbers in various formats.
+47 23546798
+47 23 54 67 98
+47 235 46 798
+4723546798
23546798

I need to be able to search for "23546798" or "2354" and find all the
phone numbers above. What kind of analyzers should I use?

Any thought's about ES having a built-in phone number field?


(Karel Minarik) #6

Couldn't regex based analyzer be used for extracting phone numbers? I
guess it would be a bit more lightweight then n-grams?

Karel

On Aug 9, 9:03 am, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hi,

This is essentially a search for an arbitrary substring. You can use
n-grams for that.

Otis

Sematext is hiring Search Engineers --http://sematext.com/about/jobs.html

On Aug 8, 4:13 pm, Sindre Sorhus sindresor...@gmail.com wrote:

What is the best way to map, analyze and search a field with a phone number?

I have phone numbers in various formats.
+47 23546798
+47 23 54 67 98
+47 235 46 798
+4723546798
23546798

I need to be able to search for "23546798" or "2354" and find all the phone
numbers above. What kind of analyzers should I use?

Any thought's about ES having a built-in phone number field?


(Sindre Sorhus) #7

That would be really useful. Not having to parse the phone numbers myself,
and having them easily searchable.


(Sindre Sorhus) #8

Something like the iOS data detectors. This could work on more than phone
number, dates, locations, ... but it depends on how big you want to make it.


(Sindre Sorhus) #9

How would that work with phone numbers with whitespace? What should I set on
the min-max grams?


(Otis Gospodnetić) #10

I haven't followed this thread closely, but it looks like there are 2
separate things here:

  1. phone number detection/extraction
  2. phone number tokenization and search

Can't 1. be handled with something like GATE or any other NER tool?
After 1. is done, isn't searching arbitrary phone number substrings
just a matter of n-gramming?

Otis

Sematext in hiring Search Engineers -- http://sematext.com/about/jobs.html

On Aug 10, 3:37 am, Sindre Sorhus sindresor...@gmail.com wrote:

How would that work with phone numbers with whitespace? What should I set on
the min-max grams?


(Sindre Sorhus) #11

Should I file a bug on github for it?


(David Sachs) #12

Perhaps you could use/integrate http://code.google.com/p/libphonenumber/,
Google's library for phone number parsing in a variety of locales/formats.
It's Apache-licensed as well.

David


(Sindre Sorhus) #13

Yes, I know, that's what I ended up using. But still, it would be very
convenient with a "phonenumber" type in ES.


(Sindre Sorhus) #14

Old discussion
here: http://groups.google.com/a/elasticsearch.com/group/users/browse_thread/thread/459046042558bfeb


(Sindre Sorhus) #15

I tried using nGrams, but I can't get it to work.

I have a number saved in ES as a string like this "48121245", but when I
search for "481 21 245", it doesn't find anything.

What am I doing wrong?

settings: https://gist.github.com/1157005
mapping: https://gist.github.com/1157027


(system) #16