Fuziness not working when querying in larger index

Hi team, I wanted to fuziness for the purpose of finding the words with minor spelling mistakes.

The query I am using is

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "respondent_name.text":{
              "query": "Surender",
              "fuzziness": "2"
            }
          }
        },
        {
          "match": {
            "cnr":"WBCS020024922021"
          }
        }
      ]
    }
  }
}

Where the document I wanted to find is:

{
  .....

  "cnr": "WBCS020024922021",        
  "respondent_name": [
    "SURENDRA @ SURENDRA "
  ],
  ....

}

Mapping:

"cnr": {
    "type": "keyword"
   },
"respondent_name": {
  "type": "keyword",
  "fields": {
    "text": {
      "type": "text",
      "analyzer": "respondent_name_analyzer",
      "search_analyzer": "respondent_name_search_analyzer"
    },
    "phonetic": {
      "type": "text",
      "analyzer": "name_phonetics_analyzer",
      "search_analyzer": "name_phonetics_analyzer"
    }
  },
  "copy_to": [
    "global_content"
  ]
}

Settings:

"respondent_name_analyzer": {
  "tokenizer": "standard",
  "filter": [
    "word_delimiter_graph",
    "lowercase",
    "respondent_name_edgegram_filter",
    "remove_duplicates"
  ]
},
"respondent_name_search_analyzer": {
  "tokenizer": "standard",
  "filter": [
    "word_delimiter_graph",
    "lowercase",
    "remove_duplicates"
  ]
}

Doing the same query in an index with 200 million documents, I am unable to fetch the document, but doing the query in an index with similar mapping and settings having less documents(<1000) is providing the result.

Note: The document I wanted to query is available in both the indexes, also I have verified by querying only CNR and it gives the result in both.

In broader view: I wanted to get the results of Surendra while querying Surender, I am not interested to use edge_gram or ngram as that gives more false positives.

One thing I noticed is if I query "Surendar" instead of "Surender" it's providing the result, I am totally confused why it's not providing the result when I am querying with Surender.

You are using different analysers at index and search time, which is a powerful feature but can also cause unintended issues. Please provide the full mappings and definition of the analysers so someone can try to reproduce the issue.

I would recommend using the explain analyze API to check how the SURENDRA @ SURENDRA string is analysed with the index time analyser specified in the mappings and then compare that to how the Surender string is analyzed by the search time analyzer.

The analyser you are using at index time seems to be using edge ngrams so I am not sure what you mean with this statement.

Have provided the properties of the analysers that I am using in the mapping, please checkout :slight_smile:

What is the definition of this filter? It does not look standard and I can not see it above. :slight_smile:

my bad:

"respondent_name_edgegram_filter": {
              "type": "edge_ngram",
              "min_gram": 3,
              "max_gram": 50,
              "preserve_original": true
            }

Hi is there any update on this ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.