Search phone number


(mr_max) #1

I have type

contact : {
   'phone':[79037767523,79037767523]
}

How can i search 9037767523?


Index on subfield combined
(Mark Harwood) #2

Check out ngrams [1]. It's a way of indexing parts of words rather than whole ones.
The definitive guide [2] has a section on this too.

[1]https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html
[2] https://www.elastic.co/guide/en/elasticsearch/guide/current/_ngrams_for_partial_matching.html


(Nik Everett) #3

I think I'd try to use pattern capture token filter to extract the things you want to match. In the case you mention you'd want to strip the leading 7. I don't know Russia's number resolution rules, but for nanpa I'd use something like

"phone_number" : {
  "type" : "pattern_capture",
  "preserve_original" : 1,
  "patterns" : [
    "1(\\d{3}(\\d+))"
  ]
}

You apply phone_number analyzer as the index_analyzer and just use a keyword analyzer that strips +-() at search time. Or strip in your application. The index_analyzer here would index a number like 19195557321 as 19195557321, 9195557321, and 5557321 which matches the way phone numbers are resolved in nanpa. A user searching for 5557321 would get all the numbers ending in 5557321 - 19195557321, 13215557321, etc.

I'd also strip all the +-() stuff from the numbers before indexing them in elasticsearch. You don't want them in the _source because they don't add anything.

I once worked for a phone company so I've thought a lot about phone numbers.

BTW - this is a tradeoff. When you get new resolution rules you have to change the mapping and reindex the whole index. If you moved term expansion that the analyzer is doing outside of elasticsearch then you could be more surgical when the patterns change. I'd suggest doing something like that if you had to cover the whole world. So you'd index

{
  "phone_number": {
    "raw": "19195557321",
    "expansions": ["1919557321", "9195557321", "5557321"]
  }
}

and you'd search on phone_number.expansions.

Which solution you take is all a matter of how big of a deal this is for you. @Mark_Harwood's solution is perfectly reasonable for lots of applications. Its certainly simpler.


(Mark Harwood) #4

Oh one more from my bookmarks - a plugin dedicated to this: https://github.com/MyPureCloud/elasticsearch-phone

Not tried it but obviously from someone who's spent some time thinking about this specific problem.
Let us know if it is any good!


(mr_max) #5

thanks all!


(Nik Everett) #6

Very cool!


(system) #7