Cardinality Aggregation gives wrong number?

Everything elasticsearch does is designed for scale. We don't want to build data analysis functions that blow up when users provide us with a lot of data.

The world of big data is, by necessity, built on fuzzier constructs [1]. We're using the same algorithms used by the other big data platforms for exactly the same reasons. As I outlined here - it's a necessary trade off in the age of big data. You can't expect to beat physics.

Arguably we could offer a function to guarantee cardinality accuracy at small scale but small scale is not our mission.

[1] Introduction to Probabilistic Data Structures - DZone

3 Likes