Getting max value of a long field without losing precision

I want to get the maximum used ID from a field which stores incremental long IDs, but the max aggregation returns double.

This has been brought up before...

The suggestion to cast back to long won't be reliable because I can't easily tell if I have lost precision.

If the API would instead return the max as a long in the first place, that would be ideal.

If you absolutely have to support floating point values as well as longs... maybe BigDecimal is a better option?

Yeah, this is unfortunately a hard limitation on the agg framework right now. All aggs convert the values to a double before operating on them. I'm not sure if that'll change, at least easily... it's a pretty fundamental part of how the agg framework operates.

Before anything else, I think it should be noted that losing precision usually isn't a problem for IDs and the like. 64bit doubles have 52 bit mantissa's, so the maximum non-float numeric you can store is 2^53. See this SO answer for a breakdown on why. So that gives you values 0-9 quadrillion (roughly) that can be stored without loss of precision. For most folks that's fine, but if you expect to have a max > 2^53 then it's a concern.

As a workaround, you could do a search with size:1 and sort by the value descending, which will give you the document with the highest value, and you can extract it from the source. Not a great solution, since it only really applies to min/max and not the other aggs, but it may help.

We've stayed away from BigDecimal so far because it's just really, really slow. The overhead of using it is just a no-go for performant aggs. If/when we want support for Longs (or other non-floats) I think we'd probably just integrate it as a framework feature... defaulting to non-float when applicable (min/max/sum is fine, avg never, etc)

2 Likes

2^52 does give you something like 4*10^15 IDs.

So you'd expect 15 digits of precision, right?

But unfortunately, we see in one of those posts I linked that Elasticsearch is formatting its JSON like this:

      "value" : 5.61380157E8

So you're losing a fair chunk of the potential precision because of a poor choice of transport data format (JSON).

Scientific notation doesn't (necessarily) mean that you are losing precision.

5.61380157E8 == 5.61380157 * 10⁸ == 561380157.0

561380157.0 is less than 2⁵³ (9007199300000000), so assuming you believe the SO answer that all values < 2⁵³ can be represented accurately, the number provided is 100% precise without any floating point errors.

I do realise that particular value doesn't, but I worry that if it's already breaking out scientific notation for a mere 9 digits, what will happen with 10, or 15... :confused:

You'd think that simply not formatting it like that would produce less questions. "Principle of least surprise", and all that.

No arguments there. The blame falls to Oracle though, not JSON per-se. When Java prints doubles to string, it converts to scientific notation above a certain threshold.

If m is less than 10^-3 or greater than or equal to 10^7, then it is represented in so-called "computerized scientific notation."

You should be safe to do a simple check for any value < 2⁵³ and conclude it's the actual, non-rounded value. And anything greater is potentially problematic. Not ideal I agree, but it is what it is.

FWIW, there is an older ticket about this exact issue: Aggregations for long and date values · Issue #9545 · elastic/elasticsearch · GitHub

The resolution was that it wasn't possible to fix cleanly without breaking compatibility for the java client. But now that the Java REST client is nearly done, and the transport client will be going away, we may be able to revisit this.

Hi all, actually I am trying to search a keyword in range query between particular dates. Can anyone help me?
curl -XGET 'http://localhost:9200/201706/_search' -d '{ "query" :{ "bool" : { "must" : [{ "match" : { "message": "isis"} }, { "range" :{ "created_at" :{ "gte" :"2017-06-01", "lte" : "2017-06-30", "format" :"yyyy/mm/dd||yyyy"}}}]}}}'

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.