Elasticsearch aggregation not matching with unique count metrics

Hello All,

I have a weird issue where if I have a large dataset with authentications against a nas device and I ask for a specific region with nas:xxxx* and the authentications against that region I get two different results for the same filter and timeframe.

If I simply ask for unique users in region x I get 2478 unique users back, but if I then ask for those usernames with a sub aggregation I only get 2409 users back.

Why is that?

Which number is correct and why do I get different results when ask for a high-level unique user count and then when asking for the detailed user names I get less back?

The number of documents (a simple count metric) stays the same throughout so it's not because it's missing shards I think. All shards with this data range in have responded fine and there are no error counters in the responses.

Can anyone help me explain that please?

The cardinality aggregation returns an approximation, which probably is why you are seeing a discrepancy.

Hi Christian, thank you very much for getting back to me.

That's interesting to know, I wasn't aware that that was the case.

Do you know if there are any way of getting exact counts back for larger datasets outside of making a search for the data and then do the counts / cardinal aggregations myself in a script?

Unfortunately for me the datasets ate 9M authentications a month with that particular filter and I end up with 20k buckets which does tally up with what I expect and get back if I ask for the list of usernames.

Is that number correct?

I haven't made the script that will extract all of the authentications and then do my own aggregations yet though.

Is this 'approximation' only happening with cardinal aggregations , i.e. the metrics in your visualisations or is it across the board for all aggregations?

It only applies to some aggregations, but you can tune the accuracy through the precision threshold. Depending on the cardinality of your field this may or may not give accurate results. Set it above your cardinality and see how it affects results.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.