Rounding error in lens sum aggregation (8.12.1)

I have the same three values reoccurring every 10 seconds as shown in Discover:

(Here was a picture, I couldn't post, because I'm a new user.)

The sum should add up to 1.642. This also shows in lens in an area visualization:

(Here was a picture, I couldn't post, because I'm a new user.)

You see the simple sum aggregation over one (redacted) field. Inspecting the data, also shows the right value of 1.642 (No rounding or formatting is defined.).

(Here was a picture, I couldn't post, because I'm a new user.)

But looking at the request response, I see a rounding error:


The request that is send is:

POST /_redacted_*/_async_search?batched_reduce_size=64&ccs_minimize_roundtrips=true&wait_for_completion_timeout=200ms&keep_on_completion=true&keep_alive=60000ms&ignore_unavailable=true&preference=1710140294656
{
  "aggs": {
    "0": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "10s",
        "time_zone": "Europe/Berlin",
        "min_doc_count": 1
      },
      "aggs": {
        "1": {
          "sum": {
            "field": "prometheus.metrics._redacted_"
          }
        }
      }
    }
  },
  "size": 0,

This error in the data becomes obvious when trying to get the derivative:

(Here was a picture, I couldn't post, because I'm a new user.)

Here is the screenshot from Discover of the three values.

I'm unsure, If I should post the other screenshots in replies one at a time, because I don't want to break any forum rules.
If someone can give me the appropriate rights, I can edit my initial post.

Hi @Jim_Panzee — welcome to the community.

I'm sorry you had trouble uploading screenshots; I'm not familiar with the forum rules for new users.

Let me make sure I understand your question, though.

It sounds like the correct value of 1.642 is showing in Lens everywhere, but then you see the trailing 000...001 in the response in the inspector. Is that right?

Hi Andrew. Thanks for your fast response.

This is correct. But if I change the formula by adding a difference aggregation in the front, the wrong values also show up in lens:


As you can see, since the wrong value alternates with the correct value this tiny derivation shows in the diagram with e-16.

@Jim_Panzee

In Discover, open up the document and look at the actual JSON and Fields see if they all show the exact values...

Discover does some niceties in the table sometimes...

I ran into something like this a while back and see if I can find it... that turned out to be actually a JSON issue... Floating Point representation in JSON...

I will see if I can find the issue I saw before ... may not be the same but was similar.

Thank you for your help Stephen,

I checked both. The json in Discover shows the correct values, e.g.:

"prometheus.metrics.redacted": [
0.084
],

I also looked at our mapping for this field and it is double.

"prometheus.metrics.redacted": {
"type": "double"
},

Mind that a double type has still a rounding error which can reflect in that 00...0001 when performing math operations over fractional values.

If you click Open in Console within the Inspector request panel and execute the query, does it return this rounding issue? If so, then the source of the problem is in the mappings and Elasticsearch.

One possible workaround I can suggest is, if you know already that your values always have a 3 digits precisions max, is to configure a 3 digits number format within the Lens visualization.

Hi Marco and thank you for your response. When I execute the aggregation in the console, I get the same occasional error. In my case the number only has four digits and I make a simple sum aggregation. I think this should not run into any double rounding errors since I don't calculate fractions.

I already thought of the workaround with a round aggregation, but wanted to report the bug anyway. When you say it is a problem on the elastic side, do I have to change the tags on this post?

Result of the aggregation execution in the console:

"aggregations": {
      "0": {
        "buckets": [
          {
            "1": {
              "value": 1.642
            },
            "key_as_string": "2024-03-08T11:30:50.000+01:00",
            "key": 1709893850000,
            "doc_count": 1150
          },
          {
            "1": {
              "value": 1.6420000000000001
            },
            "key_as_string": "2024-03-08T11:31:00.000+01:00",
            "key": 1709893860000,
            "doc_count": 1146
          },
          {
            "1": {
              "value": 1.6420000000000001
            },
            "key_as_string": "2024-03-08T11:31:10.000+01:00",
            "key": 1709893870000,
            "doc_count": 1146
          },
          {
            "1": {
              "value": 1.6420000000000001
            },
            "key_as_string": "2024-03-08T11:31:20.000+01:00",
            "key": 1709893880000,
            "doc_count": 1146
          },
          {
            "1": {
              "value": 1.642
            },
            "key_as_string": "2024-03-08T11:31:30.000+01:00",
            "key": 1709893890000,
            "doc_count": 1147
          },

It is not a bug, it's just how a double number is rounded in Java when fractional.

Ok, than I don't understand some basics.
If I run a sum aggregation like:

sum(1.01, 0.084, 0.548)

How can the result sometimes become 1.6420000000000001
and sometimes 1.642 ?

Btw. this did not happen before we upgraded to 8.12.

This depends on many things (i.e. java version used by ES who got upgraded, change of other internals, etc...) but in general doing a sum of fractional numbers is always subject to this kind of rounding issues.

A Java double is similar on how numbers are handled in JS. If you try in the console 1.01 + 0.084 + 0.548 then it returns 1.6420000000000001. In the link above it explains a bit how double precision numbers work in JS, but the formula is pretty similar in Java too.
As said, Java uses a similar representation but not identical, so there are stil subtle changes between the two.

Using a float in the mapping will lead to a less precision (7 fractional digits vs 15 of double), but it does not guarantee that the problem won't appear at the 7th digit.
Using a integer in the mapping is the approach often used when dealing with scenarios where precision is required. In this case values are scaled (i.e. * 1000 if 3 digits are required) when stored and scaled back when represented. An example use case of this is when dealing with money ( cents representation * 100 stored, then cents/100 back to visualize).

1 Like

Thank you for your explanation. As you can imagine, this is not very satisfying, as we now have to implement the "round aggregation"-workaround into the failing diagrams. I had hopes to get the old behavior back.

You can try to ask into the Elasticsearch section, maybe they can provide an alternative approach, as there are some special handles I may not know there.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.