POC elastic search - correctness & exactitude of stats

Mark_Harwood · November 8, 2018, 2:01pm

It depends.
If you are doing this analysis on low-to-middle cardinality fields (those with relatively few unique values e.g. "suppliers") then numbers will be accurate - and we will tell you that they are accurate.

If you are doing this analysis on high-cardinality fields with millions of unique values e.g. IP address then we have some potential for inaccuracies - which we measure and report.

An example - finding the top 10 IP address with the highest SUM of bytes transferred might be accurate. Each data server would return their top N high-activity ip addresses (where N is greater than 10 but less than millions for efficiency's sake). The final results are summed and we may end up with stats for 100 IP addresses and take the final top 10. We can tell you if this figure is guaranteed to be accurate.

However - the reverse of this scenario (the 10 lowest-activity IP addresses) is likely to be inaccurate. Each data server would return the N ip addresses with the least amount of activity and the final result might be wildly inaccurate - an IP address may have recorded a lot of activity on one data server so wasn't returned in its top N choices. That missing data would have a big impact on final results (and again, we tell you that).

Usually people are looking for "the biggest N" of something so the results are more trustworthy.

Speed, accuracy and size is a "pick-2 of 3" trade-off people have to make which is a problem for all distributed systems.

Topic		Replies	Views
Is it true that sometimes elasticsearch aggregations can be inaccurate? Elasticsearch	2	2080	December 11, 2018
Accuracy of Elastic Search Aggregates and Filters Elasticsearch	3	387	July 6, 2017
Are counts generally exact or approximate? Elasticsearch	2	1869	July 16, 2018
Accuracy in ES search Elasticsearch	6	2204	October 3, 2018
Cardinality, precision, and Top 10 Elasticsearch	1	371	February 20, 2020

POC elastic search - correctness & exactitude of stats

Related topics