I want to add misspelling control in my match query. For that reason I added fuzziness as below but this totally changed the expected results when I don't do fuzziness.
I am using mapping and analyzers as below
{
"state": "open",
"settings": {
"index": {
"creation_date": "1457443337681",
"analysis": {
"filter": {
"my_edge_ngram_analyzer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "10"
},
"my_word_delimiter": {
"catenate_all": "true",
"type": "word_delimiter"
}
},
"analyzer": {
"my_analyzer": {
"filter": [
"standard"
,
"lowercase"
,
"my_word_delimiter"
,
"my_edge_ngram_analyzer"
],
"type": "custom",
"tokenizer": "whitespace"
}
}
},
"number_of_shards": "5",
"number_of_replicas": "1",
"version": {
"created": "2020099"
}
}
},
"Name": {
"search_analyzer": "standard",
"analyzer": "my_analyzer",
"type": "string"
},
"ShortDescription": {
"search_analyzer": "standard",
"analyzer": "my_analyzer",
"type": "string"
}
}
},
Here how it looks like without fuzziness.
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"type": "best_fields",
"query": "hp 301",
"fields": [
"Name^7",
"ShortDescription^6"
]
}
}
]
}
}
}
as expected this query will return me most relevant results for hp 301
"_source": {
"id": 1,
"Name": "l HP CH561EE / 301 Black",
"ShortDescription": "301
"_source": {
"id": 2,
"Name": " HP E5Y87EE / 301 Set (2 x Black)",
"ShortDescription": "301
I am expecting the same results when I use fuzziness. as I understand fuzziness should only fix misspellings but not change the query results.
If I use fuzziness as AUTO with prefix_length 0, I get results as
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"type": "best_fields",
"query": "hp 301",
"fuzziness":"AUTO",
"prefix_length":0,
"fields": [
"Name^7",
"ShortDescription^6"
]
}
}
]
}
}
}
Below results is totally irrelevant. only HP is the both fields. How does it get highest score?
"_source": {
"id": 123,
"Name": "HP CE411A / 305A Cyan",
"ShortDescription": "305A",
"_source": {
"id": 1234,
"Name": "HP CC530A bis CC533A Set",
"ShortDescription": "304A",
More dramatic is that when I use fuzziness as 2 instead of AUTO, I get results as makes no sense. Why would I get 2nd one which has neither hp nor 301.
"_source": {
"id": 345,
"Name": "Utax 4401410015 Black",
"ShortDescription": "LP3014",
"_source": {
"id": 3400,
"Name": "Konica Minolta 8936-404 / EP302B Black",
"ShortDescription": "EP302B",
Further when I use "fuzziness":2, "prefix_length":1 in the same query, I am getting different results
"_source": {
"id": 778,
"Name": "593-10122 / HG308 Yellow",
"ShortDescription": "HG308",
"fuzziness":"AUTO", "prefix_length":1 has also different results,
"_source": {
"id": 8990,
"Name": "C 13 S0 53021 / 3021",
"ShortDescription": "3021",
Can somebody explain me what am I doing wrong? Do I not understand fuzziness correctly?