Multiword search term with contiguos character matching


(Ekta) #1

I am trying to write a query in ElasticSearch which matches contiguous characters in the words. So, if my index has "John Doe", I should still see "John Doe" returned by Elasticsearch for the below searches.

1.john doe
2.john do
3.ohn do
4.john
5.n doe

I have an ngram analyzer on my data with a mingram:2 and maxgram:10.
I have tried the below query so far. Is ngram the way to go here or am I missing something. Please note that searching "jo oe" should NOT return "John Doe" result.

{
"query": {
"multi_match": {
"query": "term",
"operator": "AND",
"type": "phrase_prefix",
"max_expansions": 50,
"fields": [
"Field1",
"Field2"
]
}
}
}


(Ekta) #2

This is what worked for me. Instead of an ngram, index your data as keyword. And use wildcard regex match to match the words.

"query": {
"bool": {
"should": [
{
"wildcard": { "Field1": "" + term + "" }
},
{
"wildcard": { "Field2": "" + term + "" }
}
],
"minimum_should_match": 1
}
}


#3

Very interesting use case!
Did you face any performance issues?
For example: the search "ohn do" as you need to start the search with a wildcard character "*ohn do*".

If your documents have "John Doe", maybe it would be better to use regexp query instead of wildcard.
As you said, wildcard is used on "not analyzed" fields.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.