I encountered some strange behavior of the query_string query:
I have an index with one type "objects" that contains one document with a default-analyzed string field called "content" that is empty. If I search with
Elasticsearch as expected finds nothing. And if I change the query string to "-Term" it finds the empty document, also as I expected. However, when I tried to find out whether "-" takes precedence over OR, as is usually the case, I found this: "-Term OR Termtwo" returned nothing. If I search with "(-Term) OR Termtwo" again the empty document is found. So, I asked myself whether quite unusually "OR" takes precedence over "-". But when I tried "-(Term OR Termtwo)" the empty document was again returned. Only the "-Term OR Termtwo" tried first returned nothing.
I cannot make sense of this, is this possibly a bug? I use Elasticsearch 1.5.2
If a clause only contains a negated term, then Elasticsearch will
implicitly create a match all query and the negated term (content:*
content:-Term) since Lucene does not handle purely negative clauses. The
match all trick has always existed in Lucene, Elasticsearch just does it in
the background.
I suspect the issue is with what Elasticsearch considers a purely negative
clause in its parser. Never checked. I would suggest always adding the
match all (: or fieldname:*) when using only negation explicitly to avoid
any ambiguity.
thanks for your reply. However, this is not the situation I'm concerned about, as "-Term" gives me exactly the result I expect. And really, the workaround you cited is logically equivalent and would not explain differences in results. What I'm really confused about is when the clause does not only contain a negated term. For example, if "-Term" yields one (of one in total) document, I would expect that, no matter what I "OR" with "-Term", I will get the same result. But if I use "-Term OR Termtwo" it yields an empty result - this doesn't make sense. So I wondered which of the two operators ("-", i.e. negation, and "OR") takes precedence over which, although I couldn't immediately see how this could make a difference. To my surprise, both possibilities, "(-Term) OR Termtwo" and (the most unusual) "-(Term OR Termtwo)" yielded the same single document. For me this looks like a bug, as from the logic of it, it's wrong.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.