@karussell Sorry for the delay, meant to reply to this sooner!
So I haven't looked at the script too closely, but a concern with this kind of cardinality aggregation is memory. E.g. collecting the counts in a simple map will be 100% accurate, but also has a very high memory burden because each shard will have to maintain a map of terms and then serialize that map to the coordinator.
As a toy example, consider 20 shards each with 10m unique terms. If all those terms are unique across shards (which isn't unusual if run against something like an IP address, user ID, etc) that will generate 200m unique terms which the coordinator needs to merge. Ignoring runtime speed of merging, if each term is ~10b large, that's 2gb in aggregation responses that the coordinator needs to hold in memory while reducing.
If there are a couple of those requests running in parallel, it's very easy to get to a point that the node runs out of memory.
That's why the Elasticsearch cardinality aggregator uses a HyperLogLog sketch to approximate cardinality, rather than calculate the true cardinality. In exchange for 1-5% error (depending), you can estimate cardinality in a few hundred kilobytes.
So that's the disclaimer, and why one should be careful with scripted-metrics aggs in general. We do a lot to make sure aggs have efficient runtime costs in both time and space, but scripted-metric lets you do anything you want. And it's easy to accidentally write a foot-gun