Filters aggregation has unexpected effects on cardinality vs filtered query

I have noticed that filtering using a filter query gives different cardinality results from using an identical filters aggregation on a match all query but I can't work out any logical reason for this. We stumbled across this due to debugging differences in numbers in Kibana vs our own analytics systems.

For example the following query:

{
  "size": 0,
  "query": {
    "filtered": {
      "query": {
        "match_all": {}
      },
      "filter": {
        "bool": {
          "must": [{
            "bool": {
              "should": {
                "term": {
                  "mountpoint.suffix": "android"
                }
              }
            }
          }]
        }
      }
    }
  },
  "aggs": {
    "unique": {
      "cardinality": {
        "field": "clientip"
      }
    }
  }
}

returns this:

{
    "took": 213,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "hits": {
        "total": 616887,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "unique": {
            "value": 81460
        }
    }
}

whereas if you filter with a filters aggregation rather than a query like this:

{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "android_only": {
      "filters": {
        "filters": {
          "android_only_filter": {
            "filtered": {
              "query": {
                "match_all": {}
              },
              "filter": {
                "bool": {
                  "must": [{
                    "bool": {
                      "should": {
                        "term": {
                          "mountpoint.suffix": "android"
                        }
                      }
                    }
                  }]
                }
              }
            }
          }
        }
      },
      "aggs": {
        "unique": {
          "cardinality": {
            "field": "clientip"
          }
        }
      }
    }
  }

}

you get a result that looks like this:

{
    "took": 285,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "hits": {
        "total": 2979647,
        "max_score": 0,
        "hits": []
    },
    "aggregations": {
        "android_only": {
            "buckets": {
                "android_only_filter": {
                    "doc_count": 616887,
                    "unique": {
                        "value": 84000
                    }
                }
            }
        }
    }
}

I am new to ES so I may have missed something here but I would think that in both cases the query counts unique client ips in a set of documents that match "mountpoint.suffix: android" so I cannot explain the difference.

Thanks in advance for your input

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.