Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.
There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.
I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.
Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")
Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"
And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.
ES extends Lucene query parser to add its own logic such as automatic
numeric range queries, support for type based fields (query on typeX.field1
will result in wrapping it in a filter with typeX), and also "fixing"
negative queries. The way that I try and do that is whenever a boolean query
is built, if all of its clauses are prohibited, then it replaced with
another boolean query and add a must clause of match all docs. Make sense?
Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.
There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.
I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.
Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")
Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"
And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.
Awesome. Makes perfect sense, as this is the same logic I needed to
implement myself when evaling solr. Saves me from doing some query
gymnastics, for sure.
May want to consider adding logic in the case of non-nested negatives
in OR clauses.
For example
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test" AND :)
Right now, with ES when doing query generation from a syntax tree with
this, I am adding grouping around negative OR clauses to accommodate.
So:
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test")
Which I believe ES then interprets as:
indexid:"test" OR (-indexid:"test" AND :)
ES extends Lucene query parser to add its own logic such as automatic
numeric range queries, support for type based fields (query on typeX.field1
will result in wrapping it in a filter with typeX), and also "fixing"
negative queries. The way that I try and do that is whenever a boolean query
is built, if all of its clauses are prohibited, then it replaced with
another boolean query and add a must clause of match all docs. Make sense?
Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.
There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.
I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.
Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")
Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"
And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.
Awesome. Makes perfect sense, as this is the same logic I needed to
implement myself when evaling solr. Saves me from doing some query
gymnastics, for sure.
May want to consider adding logic in the case of non-nested negatives
in OR clauses.
For example
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test" AND :)
Right now, with ES when doing query generation from a syntax tree with
this, I am adding grouping around negative OR clauses to accommodate.
So:
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test")
Which I believe ES then interprets as:
indexid:"test" OR (-indexid:"test" AND :)
ES extends Lucene query parser to add its own logic such as automatic
numeric range queries, support for type based fields (query on
typeX.field1
will result in wrapping it in a filter with typeX), and also "fixing"
negative queries. The way that I try and do that is whenever a boolean
query
is built, if all of its clauses are prohibited, then it replaced with
another boolean query and add a must clause of match all docs. Make
sense?
Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.
There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.
I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.
Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")
Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"
And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.