Why query_string doesn't honor the usual precedence rules?

I had a look into this using the validate query API.

Here's the command to debug aa OR bb AND cc:

GET githubcommits/_validate/query?q=aa+OR+bb+AND+cc&rewrite=true&df=myfield

The result is:

  "explanation" : "myfield:aa +myfield:bb +myfield:cc"

Lucene's Boolean query has the idea of mandatory must clauses and should clauses which are just nice-to-haves. In the above query aa is relegated to a wholly optional should clause that gives extra scoring points to documents that contain both of the mandatory must clauses bb and cc.
If you want to have pure OR clauses in Lucene you need to use a Boolean query with should clauses but no must clauses. Something like this:

bool
    should
         aa
         bool
              must
                  bb
                  cc

Note the use of a nested bool query above to get the required logic.
The introduction of brackets in query_string syntax forces the creation of these sub boolean clauses and makes the logic behave in a more predictable way.

Weird, but I'd hesitate to call it a bug - more a quirk of Lucene.
For readability's sake alone I would advocate using brackets to make the logic clear.

3 Likes