Problems with span_not query

Having some problems with queries containing span_not. I've simplified the
query down to a test example however the query returns additional documents
I don't think you be returned.

In short I want to find the documents that contain 'foo' but not 'bar' from:
foo
foo bar
bar foo
foo foo bar
foo bar foo
bar foo foo

The below query returns two docs ('foo' and 'bar foo foo') rather than the
one I was expecting:
{
"query": {
"span_not": {
"include": {
"span_term": {
"field1": "foo"
}
},
"exclude": {
"span_near": {
"in_order": false,
"clauses": [
{
"span_term": {
"field1": "bar"
}
},
{
"span_term": {
"field1": "foo"
}
}
],
"slop": 1000
}
}
}
}
}

Why does 'bar foo foo' match the query, and given that it does, why don't
any of the others given in_order is false?

Tested on elasticsearch 1.0.1 and 1.1.1 on Ububtu 12.04.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/711fe54f-436c-453e-8a9d-59ff73c57c67%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

My guess is that Lucene is matching on the second or third "foo" since
"bar" does not appear in its span. That said, that means the 4th document
should have match as well. Haven't actually run the query, but will try
later.

Lucene now supports additional parameters for the span not query to extend
the exclude portion "outside" of the matched span:
http://lucene.apache.org/core/4_8_0/core/org/apache/lucene/search/spans/SpanNotQuery.html

I submitted a pull request to incorporate these new features a while ago,
but the Elasticsearch team missed the change (they have a ton of open pull
requests): Expose `dist`/`pre`/`post` options for SpanNotQuery by brusic · Pull Request #4452 · elastic/elasticsearch · GitHub

If you can convince them to merge my change, things should work for you. :slight_smile:

Cheers,

Ivan

On Tue, May 6, 2014 at 7:44 AM, Matthew Brown matthew@arachnys.com wrote:

Having some problems with queries containing span_not. I've simplified the
query down to a test example however the query returns additional documents
I don't think you be returned.

gist:59b9b5ad6f68a5d12d0a · GitHub

In short I want to find the documents that contain 'foo' but not 'bar'
from:
foo
foo bar
bar foo
foo foo bar
foo bar foo
bar foo foo

The below query returns two docs ('foo' and 'bar foo foo') rather than the
one I was expecting:
{
"query": {
"span_not": {
"include": {
"span_term": {
"field1": "foo"
}
},
"exclude": {
"span_near": {
"in_order": false,
"clauses": [
{
"span_term": {
"field1": "bar"
}
},
{
"span_term": {
"field1": "foo"
}
}
],
"slop": 1000
}
}
}
}
}

Why does 'bar foo foo' match the query, and given that it does, why don't
any of the others given in_order is false?

Tested on elasticsearch 1.0.1 and 1.1.1 on Ububtu 12.04.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/711fe54f-436c-453e-8a9d-59ff73c57c67%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/711fe54f-436c-453e-8a9d-59ff73c57c67%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDfYf_xgXWN_L84HvzZWpUtF_OTOEHAE19p88J8B9uJ5Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.