Behavior of multi field query_string query doesn't seem to match documentation

Hey guys,

While doing some tests related to searching across multiple fields I
stumbled upon a behavior with multi field query_string query that seems to
contradict what is stated in the official documentation. Below is what the
documentation says on this matter:

"The idea of running the query_string query against multiple fields is by

internally creating several queries for the same query string, each with
default_field that match the fields provided."
http://www.elasticsearch.org/guide/reference/query-dsl/query-string-query.html

So, given the information above, I expected that the following multi field
query:

curl -XGET 'localhost:9200/testgroupquery/_search?pretty' -d '{
"query": {
"query_string": {
"fields": [
"title",
"address"
],
"auto_generate_phrase_queries": true,
"boost": 1.0,
"default_operator": "and",
"use_dis_max" : true,
"query": "hotel praha"
}
}
}'

Would translate into a dis_max query like this:

curl -XGET 'localhost:9200/testgroupquery/_search?pretty' -d '{
"query": {
"dis_max" : {
"tie_breaker" : 0.0,
"queries" : [
{
"query_string": {
"default_field": "title",
"auto_generate_phrase_queries": true,
"boost": 1.0,
"default_operator": "and",
"query": "hotel praha"
}
},
{
"query_string": {
"default_field": "address",
"auto_generate_phrase_queries": true,
"boost": 1.0,
"default_operator": "and",
"query": "hotel praha"
}
}
]
}
}
}'

But apparently it doesn't. I first noticed this because the resultset
returned is different between these two queries: while the multi field
query_string query matches all documents containing both "hotel" and
"praha" in the title and/or address fields (including cross matching
between the fields
- e.g. "hotel" in title and "praha" in address), the
dis_max query above returns just documents that contain both terms "hotel"
and "praha" in the same field (either title or address).

Looking at the final query for these two distinct requests using the
Validate API, I can see that the output is quite different indeed:

multi field query_string query

curl -XGET
'localhost:9200/testgroupquery/_validate/query?pretty&explain=true' -d '{
"query_string": {
"fields": [
"title",
"address"
],
"auto_generate_phrase_queries": true,
"boost": 1.0,
"default_operator": "and",
"use_dis_max" : true,
"query": "hotel praha"
}
}'

{
"valid" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"explanations" : [ {
"index" : "testgroupquery",
"valid" : true,
"explanation" : "+(title:hotel | address:hotel) +(title:praha |
address:praha)
"
} ]
}

dis_max query
*
*
curl -XGET
'localhost:9200/testgroupquery/_validate/query?pretty&explain=true' -d '{
"dis_max" : {
"tie_breaker" : 0.0,
"queries" : [
{
"query_string": {
"default_field": "title",
"auto_generate_phrase_queries": true,
"boost": 1.0,
"default_operator": "and",
"query": "hotel praha"
}
},
{
"query_string": {
"default_field": "address",
"auto_generate_phrase_queries": true,
"boost": 1.0,
"default_operator": "and",
"query": "hotel praha"
}
}
]
}
}'

{
"valid" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"explanations" : [ {
"index" : "testgroupquery",
"valid" : true,
"explanation" : "((+title:hotel +title:praha) | (+address:hotel
+address:praha))
"
} ]
}

The reason I'm pointing out this difference is that I've been trying to
configure my application to work in precisely the way the multi field
query_string query works, allowing users to search for multiple terms
allowing matches across multiple fields. So that's exactly the behavior I
want, but it seems to contradict what is in the documentation, so I want to
make sure this matching across multiple fields is an expected behavior and
not some anomaly that will be corrected at some point.

In case it helps, the attached document contains all the requests to create
the index and populate it with sample data, along with my query
requests+response for each of the tests I did.

Thank you in advance for your help.

Best,
Leo

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Leo

The reason I'm pointing out this difference is that I've been trying
to configure my application to work in precisely the way the multi
field query_string query works, allowing users to search for multiple
terms allowing matches across multiple fields. So that's exactly the
behavior I want, but it seems to contradict what is in the
documentation, so I want to make sure this matching across multiple
fields is an expected behavior and not some anomaly that will be
corrected at some point.

It is expected behaviour and is unlikely to change.

The multi_match query on the other hand DOES work the way you thought
query_string works, in other words:

{ multi_match: {
query: "foo bar",
operator: "and",
fields: ["one", "two"]
}}

does translate into: "one:(+foo +bar) two:(+foo +bar)"

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thank you very much for your confirmation of this behavior, Clint!

Now one question still remains: is the documentation incorrect then, since
it gives a different idea about the final query that will be generated?

Thanks,
Leo

On Monday, February 4, 2013 1:51:06 PM UTC-5, Clinton Gormley wrote:

Hi Leo

The reason I'm pointing out this difference is that I've been trying
to configure my application to work in precisely the way the multi
field query_string query works, allowing users to search for multiple
terms allowing matches across multiple fields. So that's exactly the
behavior I want, but it seems to contradict what is in the
documentation, so I want to make sure this matching across multiple
fields is an expected behavior and not some anomaly that will be
corrected at some point.

It is expected behaviour and is unlikely to change.

The multi_match query on the other hand DOES work the way you thought
query_string works, in other words:

{ multi_match: {
query: "foo bar",
operator: "and",
fields: ["one", "two"]
}}

does translate into: "one:(+foo +bar) two:(+foo +bar)"

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Mon, 2013-02-04 at 12:16 -0800, Leonardo Souza wrote:

Thank you very much for your confirmation of this behavior, Clint!

Now one question still remains: is the documentation incorrect then,
since it gives a different idea about the final query that will be
generated?

I'd say that the docs are, at best, ambiguous...

clint

Thanks,
Leo

On Monday, February 4, 2013 1:51:06 PM UTC-5, Clinton Gormley wrote:
Hi Leo

    > 
    > The reason I'm pointing out this difference is that I've
    been trying 
    > to configure my application to work in precisely the way the
    multi 
    > field query_string query works, allowing users to search for
    multiple 
    > terms allowing matches across multiple fields. So that's
    exactly the 
    > behavior I want, but it seems to contradict what is in the 
    > documentation, so I want to make sure this matching across
    multiple 
    > fields is an expected behavior and not some anomaly that
    will be 
    > corrected at some point. 
    
    It is expected behaviour and is unlikely to change. 
    
    The multi_match query on the other hand DOES work the way you
    thought 
    query_string works, in other words: 
    
      { multi_match: { 
          query: "foo bar", 
          operator: "and", 
          fields: ["one", "two"] 
      }} 
    
    does translate into:  "one:(+foo +bar) two:(+foo +bar)" 
    
    clint 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.