# Sum aggregation is incorrect compare to scroll all docs to sum

Hi,
I discovered that a simple sum aggregation is different to that I write a code to scroll all documents and sum the field value. the value formula is always

``````\${sum-aggregation-result} + 1 == \${scroll-sum}
``````

Elasticsearch Version: 6.3.1
index shard num: 1
index replica : 1
index Mapping is like:

``````"filed": {
"type": "long"
}
``````

## use sum aggregation

``````GET .../index/_search?size=0&pretty
{
"aggs" : {
"sum_field_price" : {
"sum" : { "field" : "field_price" }
}
}
}
'
``````

returns the result is `1.7458313843517748E16`
that is

``````17458313843517748
``````

## use scroll

I use script to scroll all documents and sum the field value in script and returns

``````17458313843517749
``````

and we also have this index data in hive, we sum the data in hive the result is also `17458313843517749`

so why the result of sum aggregation on elasticsearch is incorrect?

Maybe it is related to kahan sum algorithm which elasticsearch use in Sum Aggregation

After viewing the source, the SumAggregator use (double) sum += (double) value, is it the reason cause the long value field precision problem?
the `compensation` value in SumAggregator.getLeafCollector().collect() maybe lead to the precision diff 1.0 between `1.7458313843517748E16` and `17458313843517749 `

So, make kahan summation in sum aggregation in elasticsearch is not 100% accurate itself, even if the field type is long ?

1.7458313836007748E16 + 1.0 = 1.7458313836007748E16 in double

So, in kahan summation, if the `corrected` is very small compare to huge double `sum`
then the `newSum = sum + corrected` will lose the precision

Another question is why sum aggregation use double for all field mapping numeric type? if field type is long, we can use long type in sum aggregation, and we will not lose the precision.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.