Aggs returning more Cardinality than Source documents

I am running:

{
	"size": 0,
	"track_total_hits": true,
	"query": {
		"bool": {
			"must": [
				{"match": {"prgx_email_join": {"query": "attachmentChild" } } }
			] 
		}
	},
	"aggs": {
		"attachment_count" : {
			"cardinality": {
				"script": {
					"source": "doc['_id'].value.substring(48)"
				}
			}
		}
        }
}

And the results are:

{
  "took": 20433,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 34924205,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "attachment_count": {
      "value": 35113251
    }
  }
}

How does the attachment_count exceed the hits.total.value?

Hi Lee,
From the docs:

A single-value metrics aggregation that calculates an approximate count of distinct values.

The key word here is approximate - exact counts are not possible at scale with speed.

Good to know, I would think that even an approximation would use the upper limit of the match.

When you have multi-value fields (or scripts that can generate multiple values) there is no upper limit

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.