Hey,
I found an interesting behaviour for an OR
based query that has minimum_should_match: 100%
set. Naively I thought that this means, it behaves the same as an AND
, but it does not.
Example:
DELETE test
PUT test
{
"mappings": {
"properties": {
"title" : {
"type": "text",
"analyzer": "my_delimiter_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_delimiter_analyzer": {
"type" : "custom",
"tokenizer" : "standard",
"filter" : ["lowercase", "word_delimiter"]
}
}
}
}
}
PUT test/_doc/1
{
"title" : "chain 8mm yellow"
}
PUT test/_doc/2
{
"title" : "chain mm yellow"
}
POST test/_refresh
Now let's run an analyze request
GET test/_analyze
{
"field": "title",
"text": "chain 8mm yellow"
}
Returns as expected due to the word delimeter filter:
{
"tokens" : [
{
"token" : "chain"
},
{
"token" : "8"
},
{
"token" : "mm"
},
{
"token" : "yellow"
}
]
}
Now, let's do a simple AND
based query, that returns one document:
GET test/_search
{
"query": {
"simple_query_string": {
"default_operator": "AND",
"fields": [
"title"
],
"query": "chain 8mm yellow"
}
}
}
Now, let's do the magic (or the bug?):
# returns both hits - but why? It is minimum_should_match: 100%?!
GET test/_search
{
"query": {
"simple_query_string": {
"default_operator": "OR",
"fields": [
"title"
],
"minimum_should_match": "100%",
"query": "chain 8mm yellow"
}
}
}
This returns both documents, and when looking at the validate output it's also clear why:
GET test/_validate/query?explain=true
{
"query": {
"simple_query_string": {
"default_operator": "OR",
"fields": [
"title"
],
"minimum_should_match": "100%",
"query": "chain 8mm yellow"
}
}
}
Output is:
{
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"valid" : true,
"explanations" : [
{
"index" : "test",
"valid" : true,
"explanation" : "(title:chain (title:8 title:mm) title:yellow)~3"
}
]
}
So, basically this query moves from an expected chain OR 8mm OR yellow
with 100% matching to an chain OR (8 OR mm) OR yellow)
- but without keeping the minimum_should_match: 100%
for the inner OR query.
So, long story short: is this a bug or a feature? To me it feels buggish on first sight, but I guess in another setup it makes sense?
This is on 7.17.18
Thanks for reading through here and have a nice weekend!
--Alex