Query using Shingle filter on 5.1.2 and 5.2.2 behave differently

I have a shingle filter:

"custom_shingle_filter": {
    "type": "shingle",
    "max_shingle_size": 2,
    "min_shingle_size": 2,
    "token_separator": "",
    "output_unigrams": true
}   

Then I use this shingle filter in my search analyzer:

"custom_search_analyzer": {
    "type": "custom",
    "tokenizer": "whitespace",
    "filter": [
        "custom_shingle_filter"
    ]
}

When I run validate API to my query:

GET /test_index/test_type/_validate/query?explain=true
{
  "query" : {
    "match" : {
      "test_field" : {
        "query" : "macbook pro",
        "operator" : "and",
        "analyzer" : "custom_shingle_filter"
      }
    }
  }
}

In 5.2.2, I got:
+Graph(+test_field:macbook +test_field:pro, test_field:macbookpro, hasBoolean=true, hasPhrase=false)
It means it will query (macbook and pro) or macbookpro
But in 5.1.2:
+Synonym(test_field:macbook test_field:macbookpro) +test_field:pro
It means it will query (macbook or macbookpro) and pro ?????

The result in 5.2.2 makes sense to me, but I got different behaviour on 5.1.2 and (unfortunately) it's what we have on production.
Any idea?
Thanks.

This is very likely due to https://github.com/elastic/elasticsearch/pull/21517

Thanks, that's helpful!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.