Can't get n-grams and synonyms to work together

I'm trying to create an index for an e-commerce site an would like my index to work for typo/mispellings and for synonyms. Actually I am testing with these settings.
{
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"synonym_analyzer" : {
"tokenizer" : "standard",
"char_filter": [
"html_strip"
],
"filter" : [

                            "lowercase",
                            "synonym_filter",
                            "italian_stop",
                            "asciifolding",
                            "custom_length",
                            "custom_ngram",
                            "custom_shingle"
                            
                        ] 
                    }
                },
                "filter" : {
                    "synonym_filter" : {
                        "type" : "synonym",
                         "synonyms_path" : "analysis/synonym.txt"
                    },
                    "custom_length": 
                    {
                        "type": "length",
                        "min": 2,
                        "max": 255
                    },
                    "custom_ngram": 
                    {
                        "type": "ngram",
                        "min_gram": 3,
                        "max_gram": 12,
                        "token_chars": [
                            "letter",
                            "digit"
                        ]
                    },
                    "custom_shingle": 
                    {
                        "type":"shingle",
                        "max_shingle_size":4,
                        "min_shingle_size":2,
                        "output_unigrams":"true"
                    },
                    "italian_stop": {
                        "type":       "stop",
                        "stopwords":  "_italian_" 
                    }
                }
                
            }
        }
    },
    "mappings": {
        "product": {
            "properties": {
                "pname": { 
                    "type": "text",
                    "analyzer": "synonym_analyzer"
                },
                "shortdesc": { 
                    "type": "text",
                    "analyzer": "synonym_analyzer"
                },
                "desc": { 
                    "type": "text",
                    "analyzer": "synonym_analyzer"
                },
                "mname": { 
                    "type": "text",
                    "analyzer": "synonym_analyzer"
                }
            }
        }
    }
}

In this case my synonyms are working good, but I can't get typo or mispellings to work.
Let's say my indexed product name is "rilastil", if I search for "rilastol" I can't get any results.
If I remove mappings from the index settings:
{
"settings": {
"index" : {
"analysis" : {
"analyzer" : {
"synonym_analyzer" : {
"tokenizer" : "standard",
"char_filter": [
"html_strip"
],
"filter" : [

                            "lowercase",
                            "synonym_filter",
                            "italian_stop",
                            "asciifolding",
                            "custom_length",
                            "custom_ngram",
                            "custom_shingle"
                            
                        ] 
                    }
                },
                "filter" : {
                    "synonym_filter" : {
                        "type" : "synonym",
                        "synonyms_path" : "analysis/synonym.txt"
                    },
                    "custom_length": 
                    {
                        "type": "length",
                        "min": 2,
                        "max": 255
                    },
                    "custom_ngram": 
                    {
                        "type": "ngram",
                        "min_gram": 3,
                        "max_gram": 12,
                        "token_chars": [
                            "letter",
                            "digit"
                        ]
                    },
                    "custom_shingle": 
                    {
                        "type":"shingle",
                        "max_shingle_size":4,
                        "min_shingle_size":2,
                        "output_unigrams":"true"
                    },
                    "italian_stop": {
                        "type":       "stop",
                        "stopwords":  "_italian_" 
                    }
                }
                
            }
        }
    }
}

I can't get synonyms to work, but my typo/misspellings are working fine.

Can someone help me with this issue? Any help would ber really appreciated

Hi @iuzzino,

You need to use multi-field to apply your filters on the dedicated fields.
Check the doc for more details.
https://www.elastic.co/guide/en/elasticsearch/reference/7.4/multi-fields.html#_multi_fields_with_multiple_analyzers

Don't forget to update your query to search on the sub_fields or you can use wildcard. for example for desc field desc.*.

I agree with @gabriel_tessier, its usually easier to reason about (and maintain) solutions where exact matching and fuzzy matching (e.g. your ngram solution) are implemented on different fields. That way its easier to tune the trade-offs on those fields individually and balance the effect of both by e.g. different field boosts.

That said, I tried your first example (The "Rilastil", "Rilastol" on 7.4.0 and the typo or mispellings correction via ngrams was working for me. Just in case you want to try, here is the slightly modified example (because 7.4 follows a slightly different syntax):

DELETE test

PUT test
{
  "settings": {
    "index.max_ngram_diff": 9,
    "index": {
      "analysis": {
        "analyzer": {
          "synonym_analyzer": {
            "tokenizer": "standard",
            "char_filter": [
              "html_strip"
            ],
            "filter": [
              "lowercase",
              "synonym_filter",
              "italian_stop",
              "asciifolding",
              "custom_length",
              "custom_ngram",
              "custom_shingle"
            ]
          }
        },
        "filter": {
          "synonym_filter": {
            "type": "synonym",
            "synonyms": [
              "foo, bar => baz"
            ]
          },
          "custom_length": {
            "type": "length",
            "min": 2,
            "max": 255
          },
          "custom_ngram": {
            "type": "ngram",
            "min_gram": 3,
            "max_gram": 12,
            "token_chars": [
              "letter",
              "digit"
            ]
          },
          "custom_shingle": {
            "type": "shingle",
            "max_shingle_size": 4,
            "min_shingle_size": 2,
            "output_unigrams": "true"
          },
          "italian_stop": {
            "type": "stop",
            "stopwords": "_italian_"
          }
        }
      }
    }
  },
  "mappings": {
    
      "properties": {
        "pname": {
          "type": "text",
          "analyzer": "synonym_analyzer"
        },
        "shortdesc": {
          "type": "text",
          "analyzer": "synonym_analyzer"
        },
        "desc": {
          "type": "text",
          "analyzer": "synonym_analyzer"
        },
        "mname": {
          "type": "text",
          "analyzer": "synonym_analyzer"
        }
      }
    
  }
}

PUT /test/_doc/1
{
  "pname" : "Rilastol"
}

GET /test/_search
{
  "query": {
    "match": {
      "pname": "Rilastil"
    }
  }
}

I'm also wondering how your ngram typo correction can work if you remove the mappings in the second example, if no field uses the ngrams I doubt this should work, but maybe there is something missing in your example.

Thank you for your replies. I'm using v6.5.
In answer to Christoph and looking at my index I can say that I have added mappings and synonyms only recently, while the filters in the analyzer have always been there. So before using mappings and synonyms, I was able to find typo/mispellings. That's why I thought that mappings were not necessary for the filters to take action. I thought they were enabled by default on all fields. I added mappings only when I tried to use synonyms because I couldn't get them to work in any other way!

I must say that maybe in my case the typo mispelling was working because I have added fuzziness to my query (but this doesn't answer why it's not working anymore when I enable synonyms). Anyway this is my query setting, but I must admit that using all these setting in an e-commmerce site, I'm not really satisfied with the results...

"query": 
{
  "bool" : {
    "should" : [
      {
      "multi_match": {
  			"query": "||SEARCH_QUERY||",
  			"fields": 
  			[
  				"pname^9",
  				"ref^10",
  				"shortdesc^3",
  				"desc^1",
  				"cname^4",
  				"mname^8",
  				"attribute^2",
  				"feature^2",
  				"ean13^1"
  			],
  			"type": "most_fields",
  			"operator": "AND",
  			"minimum_should_match": "80%",
  			"fuzziness": "1",
  			"boost" : 0.1
  		}
     
      },
		{
		"match_phrase" : {
        "pname" : "||SEARCH_QUERY||"
    }
    }
    ]
  }
}

I'm using the match_phrase query to boost results that better match the product name, because of course fuzzziness and ngrams produce a lot fo results that might be not relevant.

I'll take a better look into ES and update

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.