ElasticSearch synonym and word delimiter analyzer are not compatible


(Siva Shanmuga Subramanian Murugan) #1

I have a below mapping document, which to be precise does apply word delimiter analyzer at index and search time specifically only to model field and synonym analyzer which does a search time analysis on search string.

Mapping

POST /stackoverflow
{
"settings":{
    "analysis":{
        "analyzer":{
            "keyword_analyzer":{
                "tokenizer":"keyword",
                "filter":[
                    "lowercase",
                    "asciifolding"
                ]
            },
            "synonym_analyzer":{
                "tokenizer":"standard",
                "filter":[
                    "lowercase",
                    "synonym"
                ],
                "expand":false,
                "ignore_case":true
            },
            "word_delimiter_analyzer":{
                "tokenizer":"whitespace",
                "filter":[
                    "lowercase",
                    "word_delimiter"

            ],
            "ignore_case":true
        }
    },
    "filter":{
        "synonym":{
            "type":"synonym",
            "synonyms_path":"synonyms.txt"
        },
        "word_delimiter":{
          "type":"word_delimiter",
          "generate_word_parts":true,
          "preserve_original": true
        }
    }
}
},

"mappings":{
    "vehicles":{
        "dynamic":"false",
        "dynamic_templates":[
            {
                "no_index_template":{
                    "match":"*",
                    "mapping":{
                        "index":"no",
                        "include_in_all":false
                    }
                }
            }
        ],
        "_all":{
            "enabled":false
        },
        "properties":{
            "id":{
                "type":"long",
                "ignore_malformed":true
            },
            "model":{
                "type":"nested",
                "include_in_root":true,
                "properties":{
                    "label":{
                        "type":"string",
                        "analyzer": "word_delimiter_analyzer"
                    }
                }
            },
            "make":{
                "type":"String",
                "analyzer":"keyword_analyzer"
            }
        }
    }
}
}

and some sample data is

POST /stackoverflow/vehicles/6
{

    "make" : "chevrolet",
    "model" : {
       "label" : "Silverado 2500HD"
    }
}

The below is the search query

GET /stackoverflow/_search?explain
{  
   "from":0,
   "size":10,
   "query":{  
       "filtered":{  
         "query":{ 
         "multi_match":{  
            "query":"HD2500",
             "fields":[  
                "make","model.label"
              ],
            "type":"cross_fields","operator" : "OR",
            "analyzer" : "synonym_analyzer"
          }
       }
    }
   }
 }

THe above search query does not work, rather if i remove the synonym_analzer from the search query it works perfectly fine. I really dont understand the logic behind how synonym analyzer is tampering the result.

In my synonym.txt file i dont have any reference to HD2500, and all the synonym analyzer does is split the token via whitespace and converts it to lowercase and then try to match a synonym string and then passes it to field level analyzers, i am confused where it is getting broken.

Any help is highly appreciated


(Siva Shanmuga Subramanian Murugan) #2

Any idea on this mystery?


(Siva Shanmuga Subramanian Murugan) #3

I figured out, it cannot be done in elastic search.

As soon as Synonym analyzer(query level) is applied on search string no other field level analyzer(word delimiter) will be applied


(Mark Harwood) #4

Not sure I fully follow your conclusion but if you want a search-time analyzer that includes both synonym and word-delimiter behaviour this could be defined in your mappings as

            "search_time_synonym_and_worddelim_analyzer":{
            "tokenizer":"standard",
            "filter":[
                "lowercase",
                "synonym",
                "word_delimiter"
            ],
            "expand":false,
            "ignore_case":true
        },            

And running this search:

GET /test/vehicles/_search?explain
{  
   "from":0,
   "size":10,
   "query":{  
       "filtered":{  
         "query":{ 
         "multi_match":{  
            "query":"HD2500",
             "fields":[  
                "make","model.label"
              ],
            "type":"cross_fields","operator" : "OR",
            "analyzer" : "search_time_synonym_and_worddelim_analyzer"
          }
       }
    }
   }
 }

I get a match


(Siva Shanmuga Subramanian Murugan) #5

In the proposed solution scenario the word delimiter will be applied to all fields, rather i want word delimiter to be applied only to model field.

In my case i want synonym analyzer to be applied to all fields and on top of that i want word delimiter to be specifically applied to model field alone.

Conclusion :
Either have analyzer on all the requiring fields or have a query level analyzer which gets applied to all fields.


(system) #6