ElasticSearch synonym and word delimiter analyzer are not compatible

sivamurugan · June 10, 2015, 10:35pm

I have a below mapping document, which to be precise does apply word delimiter analyzer at index and search time specifically only to model field and synonym analyzer which does a search time analysis on search string.

Mapping

POST /stackoverflow
{
"settings":{
    "analysis":{
        "analyzer":{
            "keyword_analyzer":{
                "tokenizer":"keyword",
                "filter":[
                    "lowercase",
                    "asciifolding"
                ]
            },
            "synonym_analyzer":{
                "tokenizer":"standard",
                "filter":[
                    "lowercase",
                    "synonym"
                ],
                "expand":false,
                "ignore_case":true
            },
            "word_delimiter_analyzer":{
                "tokenizer":"whitespace",
                "filter":[
                    "lowercase",
                    "word_delimiter"

            ],
            "ignore_case":true
        }
    },
    "filter":{
        "synonym":{
            "type":"synonym",
            "synonyms_path":"synonyms.txt"
        },
        "word_delimiter":{
          "type":"word_delimiter",
          "generate_word_parts":true,
          "preserve_original": true
        }
    }
}
},

"mappings":{
    "vehicles":{
        "dynamic":"false",
        "dynamic_templates":[
            {
                "no_index_template":{
                    "match":"*",
                    "mapping":{
                        "index":"no",
                        "include_in_all":false
                    }
                }
            }
        ],
        "_all":{
            "enabled":false
        },
        "properties":{
            "id":{
                "type":"long",
                "ignore_malformed":true
            },
            "model":{
                "type":"nested",
                "include_in_root":true,
                "properties":{
                    "label":{
                        "type":"string",
                        "analyzer": "word_delimiter_analyzer"
                    }
                }
            },
            "make":{
                "type":"String",
                "analyzer":"keyword_analyzer"
            }
        }
    }
}
}

and some sample data is

POST /stackoverflow/vehicles/6
{

    "make" : "chevrolet",
    "model" : {
       "label" : "Silverado 2500HD"
    }
}

The below is the search query

GET /stackoverflow/_search?explain
{  
   "from":0,
   "size":10,
   "query":{  
       "filtered":{  
         "query":{ 
         "multi_match":{  
            "query":"HD2500",
             "fields":[  
                "make","model.label"
              ],
            "type":"cross_fields","operator" : "OR",
            "analyzer" : "synonym_analyzer"
          }
       }
    }
   }
 }

THe above search query does not work, rather if i remove the synonym_analzer from the search query it works perfectly fine. I really dont understand the logic behind how synonym analyzer is tampering the result.

In my synonym.txt file i dont have any reference to HD2500, and all the synonym analyzer does is split the token via whitespace and converts it to lowercase and then try to match a synonym string and then passes it to field level analyzers, i am confused where it is getting broken.

Any help is highly appreciated

sivamurugan · June 11, 2015, 6:01pm

Any idea on this mystery?

sivamurugan · June 15, 2015, 2:56am

I figured out, it cannot be done in elastic search.

As soon as Synonym analyzer(query level) is applied on search string no other field level analyzer(word delimiter) will be applied

Mark_Harwood · June 15, 2015, 4:40pm

Not sure I fully follow your conclusion but if you want a search-time analyzer that includes both synonym and word-delimiter behaviour this could be defined in your mappings as

            "search_time_synonym_and_worddelim_analyzer":{
            "tokenizer":"standard",
            "filter":[
                "lowercase",
                "synonym",
                "word_delimiter"
            ],
            "expand":false,
            "ignore_case":true
        },

And running this search:

GET /test/vehicles/_search?explain
{  
   "from":0,
   "size":10,
   "query":{  
       "filtered":{  
         "query":{ 
         "multi_match":{  
            "query":"HD2500",
             "fields":[  
                "make","model.label"
              ],
            "type":"cross_fields","operator" : "OR",
            "analyzer" : "search_time_synonym_and_worddelim_analyzer"
          }
       }
    }
   }
 }

I get a match

sivamurugan · June 15, 2015, 6:14pm

In the proposed solution scenario the word delimiter will be applied to all fields, rather i want word delimiter to be applied only to model field.

In my case i want synonym analyzer to be applied to all fields and on top of that i want word delimiter to be specifically applied to model field alone.

Conclusion :
Either have analyzer on all the requiring fields or have a query level analyzer which gets applied to all fields.

Topic		Replies	Views
ElasticSearch enable Snowball Analyzer and Synonym on Fields Elasticsearch	2	761	July 6, 2017
Synonym analyzer not picked up as search_analyzer for field Elasticsearch	2	818	July 6, 2017
Can the ES do so Elasticsearch	3	361	July 6, 2017
Elasticsearch 1.4 - Doesn't match multiwords synonyms exactly Elasticsearch	1	549	July 5, 2017
Synonym analyzer issue in elastic search Elasticsearch	1	824	September 9, 2019

ElasticSearch synonym and word delimiter analyzer are not compatible

Related topics