Accuracy in ES search

JPelastic · September 4, 2018, 8:42am

We have a stock of about 2 million documents containing invoice information in an Elastic db.

We want to understand:

If I do a count of documents with certain fixed properties (like terms)
Or a sum over all those documents

Is the result we get back from Elastic exact? Or should I use the rule of thumb that anything I get out of ES has a statistical margin of error?

Another one I worry about is much more complex:

Looking for all companies who sent only one invoice in a specific month. The list of companies should be exact… From what I got from the course, there is always some margin of error…

I’m in doubt and I can’t afford to be

Yashasvi_Raj_Pant · September 4, 2018, 9:49am

It actually depends on what type of aggregation you are using and in what manner. For Instance, for terms aggregation, you will get the exact result if you apply "size":0 (which means to include all the keys). Also, you will get the exact value for the sum aggregation as well. Certain aggregations like cardinality are based on approximation though.

For the second answer, You can get the exact list of companies listing the companies with terms aggregation and setting "size":0.
For the monthly breakdown, you can use the Histogram aggregation of the date field and use terms of the company as the sub-aggregation.

Mark_Harwood · September 4, 2018, 10:07pm

Size = 0 won’t work for very high cardinality terms. You may need to look at the ‘composite’ agg or partitioning with the terms agg

Yashasvi_Raj_Pant · September 5, 2018, 4:04am

Also, found that "size":0 on terms is not supported from anymore from ES-5.0 onward. You have to explicitly specify the maximum size if you have the idea of it.

JPelastic · September 5, 2018, 5:55am

Would using the other techniques give me 100% surety that I got all the documents? or would there still be a margin of error?

Thank you both for your answers!

Yashasvi_Raj_Pant · September 5, 2018, 3:27pm

Composite aggregation or partitioning with the terms aggregation could give you 100% accurate value if you use them in correct way. Refer to their document for their detail implementation. Also, note that Composite aggregation is the new feature in elasticsearch-6 and still in beta phase.

If you are using older version of Elasticsearch(less than 5.0), then you can use "size":0, if your terms count is less than Integer.MAX_VALUE.

system · October 3, 2018, 3:27pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Using Elastic Search as a No-SQL for financial analytics with reasonable accuracy (exact terms match with non-analyzed strings, histogram aggregations) Elasticsearch	1	321	July 6, 2017
POC elastic search - correctness & exactitude of stats Elasticsearch	9	1122	December 6, 2018
Accuracy of elastic search aggregation (sum) when number of unique values in greater than a million Elasticsearch	10	4475	June 6, 2019
Aggregation query Elasticsearch	2	322	July 6, 2017
Accuracy of aggregation when having queries Elasticsearch	2	446	July 6, 2017

Accuracy in ES search

Related topics