I have a use case where I need to match address strings.
Since the addresses come in many different shapes and forms I decided to put all fields in one string and do a full text search.
The main match will be done on the entity name, but also the address should match, alowing for some variations.
I observe some funny behaviour on the matching of the numbers embedded in the addresses :
Assume I have the following address
"10th Floor, Trustee House, 55 Samora Machel Avenue, Harare, ZW"
with a query
{
"match": {
"address_list": {
"query": "ADDRESS",
"operator": "and",
"fuzziness": "auto"
}
}
}
I can find misspelled addresses like
ADDRESS = "10th Floor, Truste House, 55 Samore Machel Avenue, Harare, ZW"
and I can still find them if they moved a floor up
ADDRESS = "11th Floor, Truste House, 55 Samora Machel Avenue, Harare, ZW"
but if they move next door , I will not find them anymore :
ADDRESS = "10th Floor, Trustee House, 56 Samore Machel Avenue, Harare, ZW"
if they move 100 houses up the street, I find them again :
ADDRESS = "10th Floor, Truste House, 155 Samore Machel Avenue, Harare, ZW"
Is there as way to make fuzziness work also on the numbers in the string in a more predictable manner ?