ES's handling of negative queries

Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.

There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.

I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.

Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")

Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"

And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.

Thanks,
Paul

ES extends Lucene query parser to add its own logic such as automatic
numeric range queries, support for type based fields (query on typeX.field1
will result in wrapping it in a filter with typeX), and also "fixing"
negative queries. The way that I try and do that is whenever a boolean query
is built, if all of its clauses are prohibited, then it replaced with
another boolean query and add a must clause of match all docs. Make sense?

-shay.banon

On Fri, Aug 13, 2010 at 12:06 AM, Paul ppearcy@gmail.com wrote:

Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.

There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.

I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.

Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")

Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"

And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.

Thanks,
Paul

1 Like

Awesome. Makes perfect sense, as this is the same logic I needed to
implement myself when evaling solr. Saves me from doing some query
gymnastics, for sure.

May want to consider adding logic in the case of non-nested negatives
in OR clauses.

For example
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test" AND :)

Right now, with ES when doing query generation from a syntax tree with
this, I am adding grouping around negative OR clauses to accommodate.

So:
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test")

Which I believe ES then interprets as:
indexid:"test" OR (-indexid:"test" AND :)

I wouldn't be heart broken without this, though :slight_smile:

Thanks again for the awesome work!

On Aug 12, 4:35 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

ES extends Lucene query parser to add its own logic such as automatic
numeric range queries, support for type based fields (query on typeX.field1
will result in wrapping it in a filter with typeX), and also "fixing"
negative queries. The way that I try and do that is whenever a boolean query
is built, if all of its clauses are prohibited, then it replaced with
another boolean query and add a must clause of match all docs. Make sense?

-shay.banon

On Fri, Aug 13, 2010 at 12:06 AM, Paul ppea...@gmail.com wrote:

Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.

There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.

I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.

Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")

Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"

And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.

Thanks,
Paul

Yea, interesting!. Requires a bit more work interpreting the queries
generated. Can you open an issue for this, just so I won't loose track of
it?

-shay.banon

On Fri, Aug 13, 2010 at 6:36 AM, Paul ppearcy@gmail.com wrote:

Awesome. Makes perfect sense, as this is the same logic I needed to
implement myself when evaling solr. Saves me from doing some query
gymnastics, for sure.

May want to consider adding logic in the case of non-nested negatives
in OR clauses.

For example
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test" AND :)

Right now, with ES when doing query generation from a syntax tree with
this, I am adding grouping around negative OR clauses to accommodate.

So:
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test")

Which I believe ES then interprets as:
indexid:"test" OR (-indexid:"test" AND :)

I wouldn't be heart broken without this, though :slight_smile:

Thanks again for the awesome work!

On Aug 12, 4:35 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

ES extends Lucene query parser to add its own logic such as automatic
numeric range queries, support for type based fields (query on
typeX.field1
will result in wrapping it in a filter with typeX), and also "fixing"
negative queries. The way that I try and do that is whenever a boolean
query
is built, if all of its clauses are prohibited, then it replaced with
another boolean query and add a must clause of match all docs. Make
sense?

-shay.banon

On Fri, Aug 13, 2010 at 12:06 AM, Paul ppea...@gmail.com wrote:

Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.

There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.

I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.

Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")

Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"

And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.

Thanks,
Paul