ES's handling of negative queries


(ppearcy) #1

Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.

There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.

I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.

Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")

Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"

And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.

Thanks,
Paul


(Shay Banon) #2

ES extends Lucene query parser to add its own logic such as automatic
numeric range queries, support for type based fields (query on typeX.field1
will result in wrapping it in a filter with typeX), and also "fixing"
negative queries. The way that I try and do that is whenever a boolean query
is built, if all of its clauses are prohibited, then it replaced with
another boolean query and add a must clause of match all docs. Make sense?

-shay.banon

On Fri, Aug 13, 2010 at 12:06 AM, Paul ppearcy@gmail.com wrote:

Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.

There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.

I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.

Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")

Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"

And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.

Thanks,
Paul


(ppearcy) #3

Awesome. Makes perfect sense, as this is the same logic I needed to
implement myself when evaling solr. Saves me from doing some query
gymnastics, for sure.

May want to consider adding logic in the case of non-nested negatives
in OR clauses.

For example
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test" AND :)

Right now, with ES when doing query generation from a syntax tree with
this, I am adding grouping around negative OR clauses to accommodate.

So:
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test")

Which I believe ES then interprets as:
indexid:"test" OR (-indexid:"test" AND :)

I wouldn't be heart broken without this, though :slight_smile:

Thanks again for the awesome work!

On Aug 12, 4:35 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

ES extends Lucene query parser to add its own logic such as automatic
numeric range queries, support for type based fields (query on typeX.field1
will result in wrapping it in a filter with typeX), and also "fixing"
negative queries. The way that I try and do that is whenever a boolean query
is built, if all of its clauses are prohibited, then it replaced with
another boolean query and add a must clause of match all docs. Make sense?

-shay.banon

On Fri, Aug 13, 2010 at 12:06 AM, Paul ppea...@gmail.com wrote:

Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.

There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.

I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.

Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")

Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"

And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.

Thanks,
Paul


(Shay Banon) #4

Yea, interesting!. Requires a bit more work interpreting the queries
generated. Can you open an issue for this, just so I won't loose track of
it?

-shay.banon

On Fri, Aug 13, 2010 at 6:36 AM, Paul ppearcy@gmail.com wrote:

Awesome. Makes perfect sense, as this is the same logic I needed to
implement myself when evaling solr. Saves me from doing some query
gymnastics, for sure.

May want to consider adding logic in the case of non-nested negatives
in OR clauses.

For example
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test" AND :)

Right now, with ES when doing query generation from a syntax tree with
this, I am adding grouping around negative OR clauses to accommodate.

So:
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test")

Which I believe ES then interprets as:
indexid:"test" OR (-indexid:"test" AND :)

I wouldn't be heart broken without this, though :slight_smile:

Thanks again for the awesome work!

On Aug 12, 4:35 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

ES extends Lucene query parser to add its own logic such as automatic
numeric range queries, support for type based fields (query on
typeX.field1
will result in wrapping it in a filter with typeX), and also "fixing"
negative queries. The way that I try and do that is whenever a boolean
query
is built, if all of its clauses are prohibited, then it replaced with
another boolean query and add a must clause of match all docs. Make
sense?

-shay.banon

On Fri, Aug 13, 2010 at 12:06 AM, Paul ppea...@gmail.com wrote:

Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.

There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.

I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.

Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")

Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"

And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.

Thanks,
Paul


(system) #5