Has`default_operator` of QueryString changed algorithm with `multi field` from ES2.3 to ES6.1?


(Aiflu) #1

Hello, I'm updating my project from ES 2.3 to ES 6.1. I feel confused about the algorithm of QueryString with default_operator and fields.

My purpose is to search in the fields city.name and zipcode.raw with one query like New York 10001. But with the same query, the result is different between 2.3 and 6.1. So I used _validate API to debug.

Here's my debug in ES 2.3:

GET /myindex/_validate/query?explain&pretty' -d'
{
	"query": {
		"bool": {
			"must": [{
				"query_string": {
					"query": "New York 10001",
					"fields": ["city.name", "zipcode.raw"],
					"default_operator": "AND"
				}
			}]
		}
	}
}'

I got the following result (which is what I want , and which is understood according to the doc):

"explanation" : "filtered(+(+(city.name:new | zipcode.raw:New) +(city.name:york | zipcode.raw:York) +(city.name:10001 | zipcode.raw:10001)))->cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter@e17eb713)"

Now I use the same query in 6.3:

GET '/myindex/_validate/query?explain=true&pretty' -H 'Content-Type: application/json' -d'
{
	"query": {
		"bool": {
			"must": [{
				"query_string": {
					"query": "New York 10001",
					"fields": ["city.name", "zipcode.raw"],
					"default_operator": "AND"
				}
			}]
		}
	}
}'

I got the following result: (why AND amongst the terms in each field, and OR amongst the fields?)

"explanation" : "+(+((+city.name:new +city.name:york +city.name:10001) | zipcode.raw:New York 10001)) #DocValuesFieldExistsQuery [field=_primary_term]"

So after several tests, I found that in order to get what I want, I should use the query like this:

GET '/buwox-index/_validate/query?explain=true&pretty' -H 'Content-Type: application/json' -d'
{
	"query": {
		"bool": {
			"must": [{
				"query_string": {
					"query": "New AND York AND 10001",
					"fields": ["city.name", "zipcode.raw"],
					"default_operator": "OR"
				}
			}]
		}
	}
}'

With this query I can get the same result as 2.3:

"explanation" : "+(+(+(city.name:new | zipcode.raw:New) +(city.name:york | zipcode.raw:York) +(city.name:10001 | zipcode.raw:10001))) #DocValuesFieldExistsQuery [field=_primary_term]"

So here comes my question, according to the doc QueryString, if there is no explicit operator, the query will use what I defined in default_operator. Right?
And with multi field, the relation between the fields is OR clause.
So why couldn't I use no explicit operator with default_operator = AND to get the purpose? It seems that the default_operator changes also the combination between the multi field, if I don't use explicit operator.

I don't think it's related to my mapping. But just in case, I post also my mapping in 2.3 and 6.1.

My mapping in 2.3:

...
"zipcode": {
	"type": "integer",
	"index": "not_analyzed",
	"fields": {
		"raw": {
			"type": "string",
			"index": "not_analyzed"
		}
	}
},
"city": {
	"properties": {
		"name": {
			"type": "string",
			"fields": {
				"raw": {
					"type": "string",
					"index": "not_analyzed"
				}
			}
		},
		"slug": {
			"type": "string",
			"index": "not_analyzed",
			"include_in_all": false
		}
	}
}
...

My mapping in 6.1:

...
"analysis": {
    "analyzer": {
        "default": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": [ "asciifolding", "lowercase", "geowords" ]
        }
    },
    "filter": {
        "geowords": {
            "type" : "word_delimiter_graph",
            "split_on_case_change" : false,
            "preserve_original" : false
        }
    }
}
...
"zipcode": {
    "type": "integer",
    "index": true,
    "fields" : { "raw" : { "type" : "keyword", "index" : true } }
},
"city": {
    "properties" : { 
        "name" : { "type": "text", "fields" : { "raw" : { "type" : "keyword", "index" : true } } },
        "slug" : { "type": "keyword", "index": true }
    }
}
...

Can someone help? Thanks.


How does the `default_operator` of QueryString really work with `multi field` in ES 6.1? Does it change the `OR clause` between the fields?
(Jimferenczi) #2

In 6x the query_string does not split on whitespace anymore. So if you have new york 10001, each field analyzes the full string and builds a query based on the default_operator. As you already discovered, you can have the same behavior than before by adding explicit boolean operator between each term.
The reason why we don't split on whitespace anymore is to make sure that each field analyzes all terms at once. For instance if you have a multi word synonyms like new york, ny, the new behavior makes sure that a query like new york will match the synonym rule.
You can check the docs here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-dsl-query-string-query


(Aiflu) #3

Thanks a lot. Your answer solved my confusion.

So "splits text around operators" really means that the query_string uses operators to split the terms. No more implicit split with whitespace.


(Jimferenczi) #4

Yes this is how the parsing works in 6x


(Zsolt) #5

@jimczi Is there any way to configure query_search so it behaves as before?

In my case the users just typed "brown 180" in the search box and that returned all people with brown eyes AND 180 in height (default_operator:AND and analyzer:keyword). But this returns no results now and we can't ask our users to use "brown AND 180"...


(Aiflu) #6

Maybe you can replace the whitespace by " AND " when you catch the user's input, before sending it to ES. This's how I solved my problem.
Hope it can help you.


(Jimferenczi) #7

Yes that's one solution or you can create a text field to copy all values from other fields into it and use it for search. Using the query_string under the hood is maybe not the solution though if you don't want to explain how boolean operators work to your user.


(Zsolt) #8

The reason I use query_string is that we have some advanced users that run advanced queries (they know how to specify fields, operators, ranges etc.), while the majority of the users just type in the words and are happy with that. Support can also help users with an advanced query text if users want a certain result set.

I'm not sure what else then query_string I can use that covers my scenarios. And because of my advanced users I can't just blindly replace space with AND either.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.