I have noticed that filtering using a filter query gives different cardinality results from using an identical filters aggregation on a match all query but I can't work out any logical reason for this. We stumbled across this due to debugging differences in numbers in Kibana vs our own analytics systems.
For example the following query:
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [{
"bool": {
"should": {
"term": {
"mountpoint.suffix": "android"
}
}
}
}]
}
}
}
},
"aggs": {
"unique": {
"cardinality": {
"field": "clientip"
}
}
}
}
returns this:
{
"took": 213,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 616887,
"max_score": 0,
"hits": []
},
"aggregations": {
"unique": {
"value": 81460
}
}
}
whereas if you filter with a filters aggregation rather than a query like this:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"android_only": {
"filters": {
"filters": {
"android_only_filter": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [{
"bool": {
"should": {
"term": {
"mountpoint.suffix": "android"
}
}
}
}]
}
}
}
}
}
},
"aggs": {
"unique": {
"cardinality": {
"field": "clientip"
}
}
}
}
}
}
you get a result that looks like this:
{
"took": 285,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2979647,
"max_score": 0,
"hits": []
},
"aggregations": {
"android_only": {
"buckets": {
"android_only_filter": {
"doc_count": 616887,
"unique": {
"value": 84000
}
}
}
}
}
}
I am new to ES so I may have missed something here but I would think that in both cases the query counts unique client ips in a set of documents that match "mountpoint.suffix: android" so I cannot explain the difference.
Thanks in advance for your input