Hi,
I wanted to confirm a weird behaviour im observing when using match_phrase query on english analyzed fields with stopwords.
- Assume my search string is
analytics and prediction. - Assume again, when searching against an English analyzed field, the tokens generated are
analytandpredict. - Now, when doing a match_phrase search against that field, I would expect ONLY the following text phrases to match:
analytics and predictionanalytics predictionanalyze and predictionanalyze predictionanalysis predict-
analysis and predict
etc.
... sinceandbeing a stopword, instances where there is nothing between analytics and prediction, should also show up as a match in addition to where there is anand. But nothing else.
- However the behaviour im seeing is different (also backed by
explain = true. Instead, the tokens match_phrase uses areanalyse ? predictwhere ? is a wild card. - So in essence it works like a match_phrase with a slop, matching ANY phrase that begins with words that stem to
analyzeand end with words that stem topredict.
I'm wondering why! It makes it almost impossible to get a strict phrase match whenever there is a stopword in query string.