Keyword search with phonetic


#1

Hi,

I have a search case that I want to find documents that a specific term(word) has been repeated twice or more in a specific field of those documents. Some queries like Match query doesn't work in such a this situations. I've used keyword analyzer with wildcard query to achieve the result. For example, the below query returns documents that their name field has 'John' term twice or more:

"query": {
"wildcard": {
"name.keyword": "*john *john*"
}
}

It works and there isn't any problem with its result. But I have a new need now and I want phonetic matching. I've updated index mapping and name field has a new phonetic field in its fields structure with keyword analyzer and phonetic filter next to previous keyword field. When i execute below query, it doesn't return any result.

"query": {
"wildcard": {
"name.phonetic": "*john *john*"
}
}

What's wrong with that? Is there any other way to achieve this result?

Thanks in advance


(Mark Harwood) #2

I suspect mixing phonetic and wildcards in a single query will not be a good mix.

A user providing a wildcard query is asking for string matches beginning or ending with a sequence of characters while phonetic encoding algorithms may have dropped or altered several of the characters used in the original. It would be like using wildcard expressions on English words that had been silently translated into Russian as part of the indexing process.

Rather than mixing the wildcard query and the phonetic into one have you considered using multiple separate name matching approaches (exact, fuzzy, phrase, phonetic) as different query clauses but combined in a parent bool query's should clause. This combination of approaches allows the documents that match most of the matching strategies to rank higher.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.