Hi team, I wanted to fuziness for the purpose of finding the words with minor spelling mistakes.
The query I am using is
{
"query": {
"bool": {
"must": [
{
"match": {
"respondent_name.text":{
"query": "Surender",
"fuzziness": "2"
}
}
},
{
"match": {
"cnr":"WBCS020024922021"
}
}
]
}
}
}
Where the document I wanted to find is:
{
.....
"cnr": "WBCS020024922021",
"respondent_name": [
"SURENDRA @ SURENDRA "
],
....
}
Mapping:
"cnr": {
"type": "keyword"
},
"respondent_name": {
"type": "keyword",
"fields": {
"text": {
"type": "text",
"analyzer": "respondent_name_analyzer",
"search_analyzer": "respondent_name_search_analyzer"
},
"phonetic": {
"type": "text",
"analyzer": "name_phonetics_analyzer",
"search_analyzer": "name_phonetics_analyzer"
}
},
"copy_to": [
"global_content"
]
}
Settings:
"respondent_name_analyzer": {
"tokenizer": "standard",
"filter": [
"word_delimiter_graph",
"lowercase",
"respondent_name_edgegram_filter",
"remove_duplicates"
]
},
"respondent_name_search_analyzer": {
"tokenizer": "standard",
"filter": [
"word_delimiter_graph",
"lowercase",
"remove_duplicates"
]
}
Doing the same query in an index with 200 million documents, I am unable to fetch the document, but doing the query in an index with similar mapping and settings having less documents(<1000) is providing the result.
Note: The document I wanted to query is available in both the indexes, also I have verified by querying only CNR and it gives the result in both.
In broader view: I wanted to get the results of Surendra while querying Surender, I am not interested to use edge_gram or ngram as that gives more false positives.
One thing I noticed is if I query "Surendar" instead of "Surender" it's providing the result, I am totally confused why it's not providing the result when I am querying with Surender.