I've run into an interesting conundrum. I don't think this is a bug,
but I'm also not sure how to get the behavior I want, so I was hoping
someone might have a brilliant idea.
Let's say I've got a type "books", with a "title" and "author" field.
Let's further say that I'm indexing "title" with a stopwords filter,
but "author" has no stopwords filter.
The following document is the only one in the index:
{"title":"The Great Gatsby", "author":"F. Scott Fitzgerald"}
Now let's say I want to perform the following search:
{"query": {"query_string": {"query": "the great gatsby", "fields":
["title", "author"], "minimum_should_match": 3}}}
This search won't return any results. I believe this is because the
minimum_should_match is 3, and the stopwords filter is dropping "the".
So, only two tokens match, but since the original input string had 3
tokens, it's still looking for a 3-token match.
However, the behavior is different if I only search the "title" field:
{"query": {"query_string": {"query": "the great gatsby", "fields":
["title"], "minimum_should_match": 3}}}
In this case, I do get a result back, presumably because the only
analyzer in play is only generating two search tokens, so that becomes
the ceiling for the purposes of the minimum_should_match.
I'm not sure how that ceiling is calculated by the DisMax parser -- is
it the max # of terms generated by any component of the disjunction? I
don't suppose it would be possible for the minimum_should_match to be
"local" to each component? This could easily be getting pretty deep
into Lucene internals.
Anyway, just wondering if anyone has any fantastic insight : )
Thanks!
Mat