ES's handling of negative queries

ppearcy · August 12, 2010, 9:06pm

Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.

There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.

I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.

Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")

Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"

And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.

Thanks,
Paul

kimchy · August 12, 2010, 10:35pm

ES extends Lucene query parser to add its own logic such as automatic
numeric range queries, support for type based fields (query on typeX.field1
will result in wrapping it in a filter with typeX), and also "fixing"
negative queries. The way that I try and do that is whenever a boolean query
is built, if all of its clauses are prohibited, then it replaced with
another boolean query and add a must clause of match all docs. Make sense?

-shay.banon

On Fri, Aug 13, 2010 at 12:06 AM, Paul ppearcy@gmail.com wrote:

Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.

There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.

I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.

Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")

Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"

And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.

Thanks,
Paul

ppearcy · August 13, 2010, 3:36am

Awesome. Makes perfect sense, as this is the same logic I needed to
implement myself when evaling solr. Saves me from doing some query
gymnastics, for sure.

May want to consider adding logic in the case of non-nested negatives
in OR clauses.

For example
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test" AND :)

Right now, with ES when doing query generation from a syntax tree with
this, I am adding grouping around negative OR clauses to accommodate.

So:
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test")

Which I believe ES then interprets as:
indexid:"test" OR (-indexid:"test" AND :)

I wouldn't be heart broken without this, though

Thanks again for the awesome work!

On Aug 12, 4:35 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

ES extends Lucene query parser to add its own logic such as automatic
numeric range queries, support for type based fields (query on typeX.field1
will result in wrapping it in a filter with typeX), and also "fixing"
negative queries. The way that I try and do that is whenever a boolean query
is built, if all of its clauses are prohibited, then it replaced with
another boolean query and add a must clause of match all docs. Make sense?

-shay.banon

On Fri, Aug 13, 2010 at 12:06 AM, Paul ppea...@gmail.com wrote:

Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.

There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.

I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.

Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")

Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"

And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.

Thanks,
Paul

kimchy · August 13, 2010, 9:47am

Yea, interesting!. Requires a bit more work interpreting the queries
generated. Can you open an issue for this, just so I won't loose track of
it?

-shay.banon

On Fri, Aug 13, 2010 at 6:36 AM, Paul ppearcy@gmail.com wrote:

Awesome. Makes perfect sense, as this is the same logic I needed to
implement myself when evaling solr. Saves me from doing some query
gymnastics, for sure.

May want to consider adding logic in the case of non-nested negatives
in OR clauses.

For example
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test" AND :)

Right now, with ES when doing query generation from a syntax tree with
this, I am adding grouping around negative OR clauses to accommodate.

So:
indexid:"test" OR -indexid:"test"
->
indexid:"test" OR (-indexid:"test")

Which I believe ES then interprets as:
indexid:"test" OR (-indexid:"test" AND :)

I wouldn't be heart broken without this, though

Thanks again for the awesome work!

On Aug 12, 4:35 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

ES extends Lucene query parser to add its own logic such as automatic
numeric range queries, support for type based fields (query on
typeX.field1
will result in wrapping it in a filter with typeX), and also "fixing"
negative queries. The way that I try and do that is whenever a boolean
query
is built, if all of its clauses are prohibited, then it replaced with
another boolean query and add a must clause of match all docs. Make
sense?

-shay.banon

On Fri, Aug 13, 2010 at 12:06 AM, Paul ppea...@gmail.com wrote:

Hi,
I've been really surprised at how ES correctly handles negative
queries when passed in through a query_string, in most cases.

There are tons of discussions out there with lucene and negative
operations that explain that set logic is being used and a pure
negation doesn't yield anything.

I don't believe direct lucene (and solr) handle these cases correctly
and was curious what ES is doing different. I can see from the explain
output that it is adding an implicit matchall, but I thought that the
query_string would get passed straight down to lucene, untouched.

Here are a couple of example queries that highlight behavior that
diverges from Lucene:
indexid:"test" OR (-indexid:"test")
(-(indexid:"test1") AND -(indexid:"test2")) AND -(indexid:"test3")

Here is one case that doesn't work quite as I'd expect, based on above
behavior, but the behavior is consistent with Lucene:
indexid:"test" OR -indexid:"test"

And as a side note, the reason why I would be doing such funky looking
negations is that we are dynamically translate search strings from a
legacy commercial search system.

Thanks,
Paul

Topic		Replies	Views
Inconsistency with query_string results Elasticsearch	3	1080	July 6, 2017
Simple query, unexpected result Elasticsearch	4	360	July 6, 2017
Term negation and fuzziness Elasticsearch	2	536	September 25, 2023
(Newbie) Differences between text and field/query_string, and matching words vs phrases Elasticsearch	6	708	July 6, 2017
What is adjust_pure_negative? Elasticsearch	6	4749	December 23, 2019

ES's handling of negative queries

Related topics