Using wildcard and fuzziness in Elasticsearch

I have a query to search for an address where I am using also the wildcard. So if I am searching for Kennedy using the query below, I have these addresses:

Query:

GET my_index/_search
{
  "query": {
    "query_string": {
      "default_field": "address",
      "query": "Kennedy*",
      "rewrite": "scoring_boolean"
    }
  }
}

Result

"address" : "Kennedydamm 24, 40476 Düsseldorf, Deutschland"
"address" : "Kennedy-Ufer 2a, 50679 Köln, Deutschland"
"address" : "Kennedyallee 62-70, 53175 Bonn, Deutschland"
"address" : "Kennedyallee 70, 60596 Frankfurt am Main, Deutschland"
"address" : "46a Av. John F. Kennedy, 1855 Luxembourg, Luxemburg"
"address" : "35 Av. John F. Kennedy, 1855 Luxembourg, Luxemburg"
"address" : "44 Av. John F. Kennedy, 1855 Luxembourg, Luxemburg"

But on top of that I want to add the fuzziness, so let's say we have this misspelling Kennady and we try the same query above, it returns nothing.

I tried addying the fuzziness like this:

GET my_index/_search
{
  "query": {
    "query_string": {
      "default_field": "address",
      "query": "Kennady~",
      "rewrite": "scoring_boolean"
    }
  }
}

but, it returns to me only the 4 addresses below:

"address" : "Kennedy-Ufer 2a, 50679 Köln, Deutschland"
"address" : "46a Av. John F. Kennedy, 1855 Luxembourg, Luxemburg"
"address" : "35 Av. John F. Kennedy, 1855 Luxembourg, Luxemburg"
"address" : "44 Av. John F. Kennedy, 1855 Luxembourg, Luxemburg"

so addresses that are like Kennedydamm and Kennedyallee are missing.

I tried to combined like this:

GET my_index/_search
{
  "query": {
    "query_string": {
      "default_field": "address",
      "query": "Kennady~*",
      "rewrite": "scoring_boolean"
    }
  }
}

but it returns all the addresses, I mean all the documents, which is something I don't want.

So how we can combine wildcard and fuzziness in this case?

Hey.

It would be easier next time if you could reproduce a full example. You were almost there. It was just missing the mapping (create index) and the creation of the dataset.

I'd not recommend using wildcard queries but instead change the text analyzers and use multiple analyzers on the address field (by generating sub fields at index time).

You could use for example edge ngram based analyzers.
From Kennedy, it could generates tokens like [ken, kenn, kenne, kenned, kennedy]...

Then you can also apply some fuzziness on those terms at search time.

Hi David,

Thanks for your reply. Actually I tried the edge_ngram works fine, I see all the documents that I was expecting, except the order. Some result that I was expecting to have more score actually have less.

Here is the settings where I added the analyzer

{
	"settings": {
		"analysis": {
			"analyzer": {
				"my_analyzer": {
					"filter": [
						"lowercase"
					],
					"tokenizer": "my_tokenizer"
				}
			},
			"tokenizer": {
				"my_tokenizer": {
					"type": "edge_ngram",
					"min_gram": 3,
					"max_gram": 20,
					"token_chars": ["letter", "digit", "punctuation", "symbol"]
				}
			}
		}
	}
}

This is the query I am using:

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "premise.address.edge_ngram": {
              "query": "Kennady"
            }
          }
        }
      ]
    }
  }
}

But the order of result is like this:

Kennedy-Ufer 2a, 50679 Köln, Deutschlan
44 Av. John F. Kennedy, 1855 Luxembourg, Luxemburg
35 Av. John F. Kennedy, 1855 Luxembourg, Luxemburg
46a Av. John F. Kennedy, 1855 Luxembourg, Luxemburg
Kennedyallee 62-70, 53175 Bonn, Deutschland
Kennedydamm 24, 40476 Düsseldorf, Deutschland
Kennedyallee 70, 60596 Frankfurt am Main, Deutschland

I couldn't get it why the address with Kennedy-Ufer has more score than the adress with Av. John F. Kennedy.

My ideally result would be like this:

35 Av. John F. Kennedy, 1855 Luxembourg, Luxemburg
44 Av. John F. Kennedy, 1855 Luxembourg, Luxemburg
46a Av. John F. Kennedy, 1855 Luxembourg, Luxemburg
Kennedy-Ufer 2a, 50679 Köln, Deutschlan
Kennedyallee 62-70, 53175 Bonn, Deutschland
Kennedyallee 70, 60596 Frankfurt am Main, Deutschland
Kennedydamm 24, 40476 Düsseldorf, Deutschland

Thanks again for your help.

You should also search using match query. Combine both in bool should clauses.

Here's an idea of what you can do:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.