Sum aggregation is incorrect compare to scroll all docs to sum

hackerwin7 · December 25, 2019, 9:24am

Hi,
I discovered that a simple sum aggregation is different to that I write a code to scroll all documents and sum the field value. the value formula is always

${sum-aggregation-result} + 1 == ${scroll-sum}

Elasticsearch Version: 6.3.1
index shard num: 1
index replica : 1
index Mapping is like:

"filed": {
 "type": "long"
}

use sum aggregation

GET .../index/_search?size=0&pretty
{
    "aggs" : {
        "sum_field_price" : { 
            "sum" : { "field" : "field_price" } 
        }
    }
}
'

returns the result is 1.7458313843517748E16
that is

17458313843517748

use scroll

I use script to scroll all documents and sum the field value in script and returns

17458313843517749

and we also have this index data in hive, we sum the data in hive the result is also 17458313843517749

so why the result of sum aggregation on elasticsearch is incorrect?

hackerwin7 · December 26, 2019, 10:16am

cc @colings86

hackerwin7 · December 26, 2019, 10:29am

Maybe it is related to kahan sum algorithm which elasticsearch use in Sum Aggregation

hackerwin7 · December 26, 2019, 11:52am

After viewing the source, the SumAggregator use (double) sum += (double) value, is it the reason cause the long value field precision problem?
the compensation value in SumAggregator.getLeafCollector().collect() maybe lead to the precision diff 1.0 between 1.7458313843517748E16 and 17458313843517749

hackerwin7 · December 26, 2019, 12:06pm

So, make kahan summation in sum aggregation in elasticsearch is not 100% accurate itself, even if the field type is long ?

hackerwin7 · December 26, 2019, 12:40pm

1.7458313836007748E16 + 1.0 = 1.7458313836007748E16 in double

hackerwin7 · December 26, 2019, 12:46pm

So, in kahan summation, if the corrected is very small compare to huge double sum
then the newSum = sum + corrected will lose the precision

hackerwin7 · December 26, 2019, 1:08pm

Another question is why sum aggregation use double for all field mapping numeric type? if field type is long, we can use long type in sum aggregation, and we will not lose the precision.

system · January 23, 2020, 1:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Incorrect sum while aggregating in elasticsearch for one particular index Elasticsearch	6	3855	July 5, 2017
Fixing wrong mapping of float as long Elasticsearch	2	870	May 15, 2018
Confusion on Aggregation Sum Datatype Elasticsearch	2	417	October 7, 2019
Wrong sum aggregation result elastic 5 Elasticsearch	1	526	March 28, 2017
I am getting incorrect sum using aggregation from kibana visualization Elasticsearch elastic-stack-monitoring	2	730	May 27, 2020

Sum aggregation is incorrect compare to scroll all docs to sum

use sum aggregation

use scroll

Related topics