I have type
contact : {
'phone':[79037767523,79037767523]
}
How can i search 9037767523?
I have type
contact : {
'phone':[79037767523,79037767523]
}
How can i search 9037767523?
Check out ngrams [1]. It's a way of indexing parts of words rather than whole ones.
The definitive guide [2] has a section on this too.
[1]https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html
[2] https://www.elastic.co/guide/en/elasticsearch/guide/current/_ngrams_for_partial_matching.html
I think I'd try to use pattern capture token filter to extract the things you want to match. In the case you mention you'd want to strip the leading 7. I don't know Russia's number resolution rules, but for nanpa I'd use something like
"phone_number" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"1(\\d{3}(\\d+))"
]
}
You apply phone_number analyzer as the index_analyzer
and just use a keyword
analyzer that strips +-()
at search time. Or strip in your application. The index_analyzer
here would index a number like 19195557321
as 19195557321
, 9195557321
, and 5557321
which matches the way phone numbers are resolved in nanpa. A user searching for 5557321
would get all the numbers ending in 5557321
- 19195557321
, 13215557321
, etc.
I'd also strip all the +-()
stuff from the numbers before indexing them in elasticsearch. You don't want them in the _source
because they don't add anything.
I once worked for a phone company so I've thought a lot about phone numbers.
BTW - this is a tradeoff. When you get new resolution rules you have to change the mapping and reindex the whole index. If you moved term expansion that the analyzer is doing outside of elasticsearch then you could be more surgical when the patterns change. I'd suggest doing something like that if you had to cover the whole world. So you'd index
{
"phone_number": {
"raw": "19195557321",
"expansions": ["1919557321", "9195557321", "5557321"]
}
}
and you'd search on phone_number.expansions.
Which solution you take is all a matter of how big of a deal this is for you. @Mark_Harwood's solution is perfectly reasonable for lots of applications. Its certainly simpler.
Oh one more from my bookmarks - a plugin dedicated to this: https://github.com/MyPureCloud/elasticsearch-phone
Not tried it but obviously from someone who's spent some time thinking about this specific problem.
Let us know if it is any good!
thanks all!
Very cool!
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.