Exists filter does not respect must_not bool filter


(Ayush Sangani) #1

Hi Everyone,

Goal: I want to find all the documents which does not have giving.assignee
field.

I am executing below query on ES version 1.3.2 involving exists filter and
boolean filter.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must_not": [
{
"exists": {
"field": "giving.assignee"
}
}
]
}
}
}
},
"size": 2000
}

While executing this query it gives me those documents also where
giving.assignee field exists or has some value in it.
We have around 2 million documents and it's returning almost close to 2
million documents.

I have also tried using the missing filter but no luck.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"missing": {
"field": "giving.assignee"
}
}
]
}
}
}
},
"size": 20000
}

Same result as of the above query.
If someone can point me what am I doing wrong here or if further
information is needed please let me know.
Looking forward for help.

Thanks,
Ayush Sangani

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/89f2193a-5f11-447f-901c-29790318ddbf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ivan Brusic) #2

Is giving.assignee a sub-object or a nested document? Can you provide your
mapping? Use the mapping API for exact results (
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-get-mapping.html
)

Perhaps enabling explain would provide some hints,

--
Ivan

On Wed, Sep 3, 2014 at 2:07 PM, ElasticRabbit ayushsangani@gmail.com
wrote:

Hi Everyone,

Goal: I want to find all the documents which does not have giving.assignee
field.

I am executing below query on ES version 1.3.2 involving exists filter and
boolean filter.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must_not": [
{
"exists": {
"field": "giving.assignee"
}
}
]
}
}
}
},
"size": 2000
}

While executing this query it gives me those documents also where
giving.assignee field exists or has some value in it.
We have around 2 million documents and it's returning almost close to 2
million documents.

I have also tried using the missing filter but no luck.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"missing": {
"field": "giving.assignee"
}
}
]
}
}
}
},
"size": 20000
}

Same result as of the above query.
If someone can point me what am I doing wrong here or if further
information is needed please let me know.
Looking forward for help.

Thanks,
Ayush Sangani

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/89f2193a-5f11-447f-901c-29790318ddbf%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/89f2193a-5f11-447f-901c-29790318ddbf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBxKY4_GOfEP2PXhd0on16KZvT6Z%3D%2Bx2zwci%3D9KHJy6sQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #3

The behavior of exists/missing has slightly changed but this is
unfortunately not well documented yet.

Can you please try

{
"query": {
"filtered": {
"filter": {
"not": {
"filter": {
"range": {
"giving.assignee": {
}
}
}
}
}
}
}
}

instead and see if it works better for your case?

See also https://github.com/elasticsearch/elasticsearch/issues/7348

Jörg

On Wed, Sep 3, 2014 at 11:07 PM, ElasticRabbit ayushsangani@gmail.com
wrote:

Hi Everyone,

Goal: I want to find all the documents which does not have giving.assignee
field.

I am executing below query on ES version 1.3.2 involving exists filter and
boolean filter.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must_not": [
{
"exists": {
"field": "giving.assignee"
}
}
]
}
}
}
},
"size": 2000
}

While executing this query it gives me those documents also where
giving.assignee field exists or has some value in it.
We have around 2 million documents and it's returning almost close to 2
million documents.

I have also tried using the missing filter but no luck.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"missing": {
"field": "giving.assignee"
}
}
]
}
}
}
},
"size": 20000
}

Same result as of the above query.
If someone can point me what am I doing wrong here or if further
information is needed please let me know.
Looking forward for help.

Thanks,
Ayush Sangani

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/89f2193a-5f11-447f-901c-29790318ddbf%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/89f2193a-5f11-447f-901c-29790318ddbf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFjgTOVnX1hfNOJEowYyz3ZUS8epu2QQjnNFK3me6TdRA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ayush Sangani) #4

Hi Ivan,

Thanks for reply.
Please find below the mapping for the giving field.

{
"giving": {
"properties": {
"assignee": {
"type": "string",
"fields": {
"assignee": {
"type": "string",
"index": "analyzed",
"store": "yes",
"include_in_all": false
},
"untouched": {
"type": "string",
"index": "not_analyzed",
"store": "yes"
}
}
}
}
}
}

Thanks,
Ayush

On Wednesday, September 3, 2014 5:17:29 PM UTC-4, Ivan Brusic wrote:

Is giving.assignee a sub-object or a nested document? Can you provide your
mapping? Use the mapping API for exact results (
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-get-mapping.html
)

Perhaps enabling explain would provide some hints,

--
Ivan

On Wed, Sep 3, 2014 at 2:07 PM, ElasticRabbit <ayushs...@gmail.com
<javascript:>> wrote:

Hi Everyone,

Goal: I want to find all the documents which does not have
giving.assignee field.

I am executing below query on ES version 1.3.2 involving exists filter
and boolean filter.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must_not": [
{
"exists": {
"field": "giving.assignee"
}
}
]
}
}
}
},
"size": 2000
}

While executing this query it gives me those documents also where
giving.assignee field exists or has some value in it.
We have around 2 million documents and it's returning almost close to 2
million documents.

I have also tried using the missing filter but no luck.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"missing": {
"field": "giving.assignee"
}
}
]
}
}
}
},
"size": 20000
}

Same result as of the above query.
If someone can point me what am I doing wrong here or if further
information is needed please let me know.
Looking forward for help.

Thanks,
Ayush Sangani

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/89f2193a-5f11-447f-901c-29790318ddbf%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/89f2193a-5f11-447f-901c-29790318ddbf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/465afb27-b018-4aa7-83fc-69e5cd740409%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ayush Sangani) #5

Hi Jorg,

"giving.assignee" is a string field I tried your suggestion also but it
didn't work.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"not": {
"filter": {
"exists": {
"field": "giving.assignee"
}
}
}
}
}
},
"size": 20000
}

I wonder if it is a bug in ES 1.3.2.
Please let me know there is any other way to fix this.

Thanks,
Ayush Sangani

On Wednesday, September 3, 2014 5:20:54 PM UTC-4, Jörg Prante wrote:

The behavior of exists/missing has slightly changed but this is
unfortunately not well documented yet.

Can you please try

{
"query": {
"filtered": {
"filter": {
"not": {
"filter": {
"range": {
"giving.assignee": {
}
}
}
}
}
}
}
}

instead and see if it works better for your case?

See also https://github.com/elasticsearch/elasticsearch/issues/7348

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/10d3b16d-1363-4dcd-ad52-0dfb3c84142e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Ayush Sangani) #6

Hi Jorg,

I was in a assumption that range filter has to be used for numeric fields.
But this works thanks for the help.

If anyone could enlighten me why must_not bool filter doesn't respect
exists filter?

Thanks,
Ayush Sangani

On Wednesday, September 3, 2014 5:20:54 PM UTC-4, Jörg Prante wrote:

The behavior of exists/missing has slightly changed but this is
unfortunately not well documented yet.

Can you please try

{
"query": {
"filtered": {
"filter": {
"not": {
"filter": {
"range": {
"giving.assignee": {
}
}
}
}
}
}
}
}

instead and see if it works better for your case?

See also https://github.com/elasticsearch/elasticsearch/issues/7348

Jörg

On Wed, Sep 3, 2014 at 11:07 PM, ElasticRabbit <ayushs...@gmail.com
<javascript:>> wrote:

Hi Everyone,

Goal: I want to find all the documents which does not have
giving.assignee field.

I am executing below query on ES version 1.3.2 involving exists filter
and boolean filter.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must_not": [
{
"exists": {
"field": "giving.assignee"
}
}
]
}
}
}
},
"size": 2000
}

While executing this query it gives me those documents also where
giving.assignee field exists or has some value in it.
We have around 2 million documents and it's returning almost close to 2
million documents.

I have also tried using the missing filter but no luck.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"missing": {
"field": "giving.assignee"
}
}
]
}
}
}
},
"size": 20000
}

Same result as of the above query.
If someone can point me what am I doing wrong here or if further
information is needed please let me know.
Looking forward for help.

Thanks,
Ayush Sangani

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/89f2193a-5f11-447f-901c-29790318ddbf%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/89f2193a-5f11-447f-901c-29790318ddbf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e2903b18-475b-48b4-b905-365d5b49e66e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jörg Prante) #7

Yes, range filter operates on all fields.

The missing/exists operation has been slightly changed in recent versions.
For high cardinality fields, operations on the field content were very
expensive. So, an optimization was introduced: each doc carries a list of
the field names in a hidden field, and missing/exists refer to this new
hidden field, which is extremely fast.

The downside is that operations that depend on field values (like your
boolean must_not) can no longer be mixed with the new exists/missing field
name filter.

Jörg

On Thu, Sep 4, 2014 at 7:01 PM, ElasticRabbit ayushsangani@gmail.com
wrote:

Hi Jorg,

I was in a assumption that range filter has to be used for numeric fields.
But this works thanks for the help.

If anyone could enlighten me why must_not bool filter doesn't respect
exists filter?

Thanks,
Ayush Sangani

On Wednesday, September 3, 2014 5:20:54 PM UTC-4, Jörg Prante wrote:

The behavior of exists/missing has slightly changed but this is
unfortunately not well documented yet.

Can you please try

{
"query": {
"filtered": {
"filter": {
"not": {
"filter": {
"range": {
"giving.assignee": {
}
}
}
}
}
}
}
}

instead and see if it works better for your case?

See also https://github.com/elasticsearch/elasticsearch/issues/7348

Jörg

On Wed, Sep 3, 2014 at 11:07 PM, ElasticRabbit ayushs...@gmail.com
wrote:

Hi Everyone,

Goal: I want to find all the documents which does not have
giving.assignee field.

I am executing below query on ES version 1.3.2 involving exists filter
and boolean filter.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must_not": [
{
"exists": {
"field": "giving.assignee"
}
}
]
}
}
}
},
"size": 2000
}

While executing this query it gives me those documents also where
giving.assignee field exists or has some value in it.
We have around 2 million documents and it's returning almost close to 2
million documents.

I have also tried using the missing filter but no luck.

{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"missing": {
"field": "giving.assignee"
}
}
]
}
}
}
},
"size": 20000
}

Same result as of the above query.
If someone can point me what am I doing wrong here or if further
information is needed please let me know.
Looking forward for help.

Thanks,
Ayush Sangani

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/89f2193a-5f11-447f-901c-29790318ddbf%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/89f2193a-5f11-447f-901c-29790318ddbf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/e2903b18-475b-48b4-b905-365d5b49e66e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/e2903b18-475b-48b4-b905-365d5b49e66e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFLuWCdxTh_g4aGVp9V9UdV-U9uMHWHpA%3DsfrDPhOjaVA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Ayush Sangani) #8

Appreciate your explanation, and as per your suggestion range filter gives
correct results.
I am still confused with the usage of exists filter.

As per my understanding the implementation of exists filter is changed in
v1.3 to increase the speed but why it deviates from it's expected behavior.

No doubt exists filter might be fast or optimized but it is chopping off
more than half of the results.

For Example:

I want to find out number of documents where "giving.assignee" field
exists.
Note: "giving.assignee" is a string analyzed field.

{
"query": {
"filtered": {
"filter": {
"exists": {
"field": "giving.assignee"
}
}
}
},
"size": 2000
}

Above query returns only 25607 documents whereas it should return 110827
documents.

And If I run above query using range filter it gives me expected results
i.e. 110827 documents.
Query:

{
"query": {
"filtered": {
"filter": {
"range": {
"giving.assignee": {}
}
}
}
},
"size": 2000
}

Can someone please explain the difference?
It would be helpful to know when to use Exists filter ?

Thanks,
Ayush Sangani

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/db4b6fdd-cbc1-49db-aa37-2abd4fd08247%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mauricio Repetto) #9

Hi everyone!

Another noob here with the same doubts that you had Ayush. I was testing a silly thing with the exists filter, but looks like is not working at all :confused: Look at the example below, when I run it, it completly ignores the exists filter and bring me the only result I have, only when I remove the { "query":{ "match_all":{} } } it works (giving me 0 results). How is that about that the missing/exists operations has been slightly changed??

POST /test/tweet
{
"message": "some arrays in this tweet...",
"tags": [
"elasticsearch",
"wow"
],
"lists": [
{
"name": "prog_list",
"description": "programming list"
},
{
"name": "cool_list",
"description": "cool stuff list"
}
],
"numbers": [1,2,3,4]

}

GET /test/tweet/_search?explain #&format=yaml
{
_source:["lists.description","ttt"],
"query": {
"filtered": {
"filter": [
{
"exists": {
"field": "ttt"
}
},
{
"query": {
"match_all": {}
}
}
]
}
}
}

Thanks!
Mauricio Repetto


(system) #10