Fulltext search with fuzziness on strings containing numbers

nampord · January 18, 2018, 11:47am

I have a use case where I need to match address strings.

Since the addresses come in many different shapes and forms I decided to put all fields in one string and do a full text search.

The main match will be done on the entity name, but also the address should match, alowing for some variations.

I observe some funny behaviour on the matching of the numbers embedded in the addresses :

Assume I have the following address

"10th Floor, Trustee House, 55 Samora Machel Avenue, Harare, ZW"

with a query

{
"match": {
"address_list": {
"query": "ADDRESS",
"operator": "and",
"fuzziness": "auto"
}
}
}

I can find misspelled addresses like

ADDRESS = "10th Floor, Truste House, 55 Samore Machel Avenue, Harare, ZW"

and I can still find them if they moved a floor up

ADDRESS = "11th Floor, Truste House, 55 Samora Machel Avenue, Harare, ZW"

but if they move next door , I will not find them anymore :

ADDRESS = "10th Floor, Trustee House, 56 Samore Machel Avenue, Harare, ZW"

if they move 100 houses up the street, I find them again :

ADDRESS = "10th Floor, Truste House, 155 Samore Machel Avenue, Harare, ZW"

Is there as way to make fuzziness work also on the numbers in the string in a more predictable manner ?

mayya · January 26, 2018, 11:40pm

Hi there,
This is an expected behaviour. Fuzziness is calculated based on the Levenstein edit distance: Common options | Elasticsearch Guide [8.11] | Elastic
And your numbers are not represented as numbers, but as text tokens.

Your option "auto" means that an allowed edit distance will be based on the length of the term. For a short terms like your apartment 55, the allowed distance could be even 0. Set fuzziness to 1 to allow 1 edit distance, and the apartment 56 will be found.

"fuzziness": 1

Also, for a better control consider other fuzziness options, such as prefix_length and transpositions

system · February 23, 2018, 11:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fuzziness AUTO doesn't work as expected Elasticsearch	2	573	July 6, 2017
Stree Address Queries Elasticsearch	3	329	July 6, 2017
Using wildcard and fuzziness in Elasticsearch Elasticsearch	4	2349	November 19, 2021
Implement full-text fuzzy prefix search in Elastic search Elasticsearch	3	1411	January 1, 2019
Fuzziness and analysis Elasticsearch	1	457	March 9, 2018

Fulltext search with fuzziness on strings containing numbers

Related topics