Behaviour of match phrase query on english analyzed fields with stopwords

Jaspreet_Singh · November 2, 2018, 4:57pm

Hi,

I wanted to confirm a weird behaviour im observing when using match_phrase query on english analyzed fields with stopwords.

Assume my search string is analytics and prediction.
Assume again, when searching against an English analyzed field, the tokens generated are analyt and predict.
Now, when doing a match_phrase search against that field, I would expect ONLY the following text phrases to match:
- analytics and prediction
- analytics prediction
- analyze and prediction
- analyze prediction
- analysis predict
- analysis and predict
  etc.
  ... since and being a stopword, instances where there is nothing between analytics and prediction, should also show up as a match in addition to where there is an and. But nothing else.
However the behaviour im seeing is different (also backed by explain = true. Instead, the tokens match_phrase uses are analyse ? predict where ? is a wild card.
So in essence it works like a match_phrase with a slop, matching ANY phrase that begins with words that stem to analyze and end with words that stem to predict.

I'm wondering why! It makes it almost impossible to get a strict phrase match whenever there is a stopword in query string.

Jaspreet_Singh · November 5, 2018, 9:29pm

Any thoughts anyone?

Jaspreet_Singh · November 7, 2018, 3:52pm

@dadoonet would appreciate any pointers

Jaspreet_Singh · November 21, 2018, 6:30pm

I figured it out. Really the gist is ...
For a document to be considered a match for any phrase say, “quick brown fox”, the following must be true:

quick , brown , and fox must all appear in the field.
The position of brown must be 1 greater than the position of quick .
The position of fox must be 2 greater than the position of quick .
Then the specifics are down to the tokens that are generated.

system · December 19, 2018, 6:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How does the match_phrase work for a field with different search_analyzer/index_analyzer? Elasticsearch	1	381	July 6, 2017
Problem understanding phrase matching with stop words Elasticsearch	3	1282	September 21, 2017
Phrase match on an index analyzed with stemmer Elasticsearch	1	526	June 6, 2017
Phrase Match Query Problem Elasticsearch	2	330	July 6, 2017
Exact Phrase Match on a not_analyzed field with a space in the phrase Elasticsearch	3	1346	July 6, 2017