Fuziness not working when querying in larger index

SriramOnGrid · January 6, 2024, 7:44am

Hi team, I wanted to fuziness for the purpose of finding the words with minor spelling mistakes.

The query I am using is

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "respondent_name.text":{
              "query": "Surender",
              "fuzziness": "2"
            }
          }
        },
        {
          "match": {
            "cnr":"WBCS020024922021"
          }
        }
      ]
    }
  }
}

Where the document I wanted to find is:

{
  .....

  "cnr": "WBCS020024922021",        
  "respondent_name": [
    "SURENDRA @ SURENDRA "
  ],
  ....

}

Mapping:

"cnr": {
    "type": "keyword"
   },
"respondent_name": {
  "type": "keyword",
  "fields": {
    "text": {
      "type": "text",
      "analyzer": "respondent_name_analyzer",
      "search_analyzer": "respondent_name_search_analyzer"
    },
    "phonetic": {
      "type": "text",
      "analyzer": "name_phonetics_analyzer",
      "search_analyzer": "name_phonetics_analyzer"
    }
  },
  "copy_to": [
    "global_content"
  ]
}

Settings:

"respondent_name_analyzer": {
  "tokenizer": "standard",
  "filter": [
    "word_delimiter_graph",
    "lowercase",
    "respondent_name_edgegram_filter",
    "remove_duplicates"
  ]
},
"respondent_name_search_analyzer": {
  "tokenizer": "standard",
  "filter": [
    "word_delimiter_graph",
    "lowercase",
    "remove_duplicates"
  ]
}

Doing the same query in an index with 200 million documents, I am unable to fetch the document, but doing the query in an index with similar mapping and settings having less documents(<1000) is providing the result.

Note: The document I wanted to query is available in both the indexes, also I have verified by querying only CNR and it gives the result in both.

In broader view: I wanted to get the results of Surendra while querying Surender, I am not interested to use edge_gram or ngram as that gives more false positives.

One thing I noticed is if I query "Surendar" instead of "Surender" it's providing the result, I am totally confused why it's not providing the result when I am querying with Surender.

Christian_Dahlqvist · January 6, 2024, 8:17am

You are using different analysers at index and search time, which is a powerful feature but can also cause unintended issues. Please provide the full mappings and definition of the analysers so someone can try to reproduce the issue.

I would recommend using the explain analyze API to check how the SURENDRA @ SURENDRA string is analysed with the index time analyser specified in the mappings and then compare that to how the Surender string is analyzed by the search time analyzer.

The analyser you are using at index time seems to be using edge ngrams so I am not sure what you mean with this statement.

SriramOnGrid · January 6, 2024, 8:34am

Have provided the properties of the analysers that I am using in the mapping, please checkout

Christian_Dahlqvist · January 6, 2024, 9:08am

What is the definition of this filter? It does not look standard and I can not see it above.

SriramOnGrid · January 6, 2024, 9:24am

my bad:

"respondent_name_edgegram_filter": {
              "type": "edge_ngram",
              "min_gram": 3,
              "max_gram": 50,
              "preserve_original": true
            }

SriramOnGrid · January 10, 2024, 8:49am

Hi is there any update on this ?

system · February 7, 2024, 8:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Weired behaviour of fuzziness in elasticsearch Elasticsearch	3	491	March 22, 2023
Elasticsearch: Handling fuzziness Elasticsearch	1	288	January 17, 2019
Query string query ngrams and wildcards or fuzziness or proximity searches Elasticsearch	7	3262	December 23, 2017
Incorrect query analyzing that don't correspond to the search_analyzer setting Elasticsearch	1	290	November 23, 2020
elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', '[match] analyzer [name_phonetic] not found') Elasticsearch	4	4310	December 16, 2021

Fuziness not working when querying in larger index

Related topics