Elasticsearch stats aggregation corrupt float number precision

JavaDeveloper · May 16, 2020, 12:02pm

Hi everyone.
I'm using stats aggregation (max, min, avg) over some documents with float number fields.
These float numbers are saved with two decimal point in documents (for example 69.43). But the stats aggregation output corrupts format of float numbers, so that it shows more decimal points.

I'm using java API, so that org.elasticsearch.search.aggregations.support.ValueType class includes only double format and does not have any option to specify precision.

How can I do that with the elasticsearch?I don't want to do it after manually.

polyfractal · May 18, 2020, 5:11pm

May I ask what you mean by "corrupt"?

When you index a document with float, even if you only provide two decimal places in the JSON, the internal float may or may not have the same precision. Floats are stored using standard IEEE-754 floating point representation and not all values are able to be explicitly stored. For example, if you try to store the number 12.12 in an IEEE-754 floating point value, internally it is stored as 12.11999988555908203125 because that is the closest arrangement of mantissa and exponent.

This calculator is a convenient way to see how floats are stored: https://www.h-schmidt.net/FloatConverter/IEEE754.html

So that's part of the issue. The other side of the coin is that Elasticsearch aggregations always operate with Doubles, so all values (longs, bytes, shorts, floats, etc) are upconverted to a double. There are technical reasons and historical reasons for this decision, but that's how it is today.

If you only want to see two decimal places of precision, you can use the format parameter on the aggregation:

"format": "00.00"

Which should give you only two sigfigs after the decimal (I believe, typed that from memory).

JavaDeveloper · May 19, 2020, 8:17am

Thank you for your response.
But neither of these two method format("00.00") and format("%.2f") had effect on the result:

.aggregation(AggregationBuilders.stats("myField").field("my_fileld").format("00.00"))
.aggregation(AggregationBuilders.stats("myField2").field("my_fileld2").format("%.2f"))

Therefore, I wrote my own code after getting the results and used BigDecimal class to change precesion.

polyfractal · May 21, 2020, 3:18pm

Ah, right. Sorry about that, format is only applied to the JSON output. When working in Java, we always provide the actual raw value.

Internally format just uses Java's DecimalFormat so you could do the same, or use something like BigDecimal.

system · June 18, 2020, 3:18pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unexpected floating-point precision issue with scaled_float field Elasticsearch	0	54	October 31, 2024
Aggregation on double field Elasticsearch	2	1577	July 5, 2017
Question about precision in Elastic Search, Need to take the values till 18 to 20 decimals Elasticsearch	8	3281	April 16, 2019
Floating point precision in response Elasticsearch	2	1379	July 6, 2017
Sum aggregation precision Elasticsearch	2	2419	March 30, 2021

Elasticsearch stats aggregation corrupt float number precision

Related topics