I wanted to confirm a weird behaviour im observing when using match_phrase query on english analyzed fields with stopwords.
- Assume my search string is
analytics and prediction.
- Assume again, when searching against an English analyzed field, the tokens generated are
- Now, when doing a match_phrase search against that field, I would expect ONLY the following text phrases to match:
analytics and prediction
analyze and prediction
analysis and predict
andbeing a stopword, instances where there is nothing between analytics and prediction, should also show up as a match in addition to where there is an
and. But nothing else.
- However the behaviour im seeing is different (also backed by
explain = true. Instead, the tokens match_phrase uses are
analyse ? predictwhere ? is a wild card.
- So in essence it works like a match_phrase with a slop, matching ANY phrase that begins with words that stem to
analyzeand end with words that stem to
I'm wondering why! It makes it almost impossible to get a strict phrase match whenever there is a stopword in query string.