Docs in my index have a unique id field, which I want to count using a cardinality agg. I found that if my result set is 5792 docs or smaller, the cardinality agg is accurate, but if I go to 5793 or larger, the cardinality agg returns a number that is one higher than it should be. In other words, with a hit count of 5793, the cardinality agg shows 5794. I've moved the window all around, thinking maybe I had one doc containing two id's, but I've determined I don't. Seems to be that the agg is just off by one. I've set the precision threshold to 6000 or higher, without any change. Why does this particular doc count cause a problem for the agg, and is this a bug I should report?
I've set the precision threshold to 6000 or higher, without any change.
The precision threshold doesn't mark the boundary between accurate and inaccurate - just use of counting technique 1 versus counting technique 2. Counting technique 1 is less susceptible to inaccuracies but is still not guaranteed to be fully accurate. That said, it is based on collecting hashes of values which can occasionally collide so I'd have expected an under-count rather than an over-count. One possible explanation is that a value may be held in different field types across indices, in which case string
1234 != integer
1234 when merging results from the different indices.