Hi,
I wanted to confirm a weird behaviour im observing when using match_phrase query on english analyzed fields with stopwords.
- Assume my search string is
analytics and prediction
. - Assume again, when searching against an English analyzed field, the tokens generated are
analyt
andpredict
. - Now, when doing a match_phrase search against that field, I would expect ONLY the following text phrases to match:
analytics and prediction
analytics prediction
analyze and prediction
analyze prediction
analysis predict
-
analysis and predict
etc.
... sinceand
being a stopword, instances where there is nothing between analytics and prediction, should also show up as a match in addition to where there is anand
. But nothing else.
- However the behaviour im seeing is different (also backed by
explain = true
. Instead, the tokens match_phrase uses areanalyse ? predict
where ? is a wild card. - So in essence it works like a match_phrase with a slop, matching ANY phrase that begins with words that stem to
analyze
and end with words that stem topredict
.
I'm wondering why! It makes it almost impossible to get a strict phrase match whenever there is a stopword in query string.