Wrong returned Values on simple SUM-Aggregation and scale up/down

Trying to sum/stats of transaction amounts for given search criteria but sum/stats amount is getting rounded up/down in place of returning decimal 2 positions. Below is one transaction data. The transactionAmount field is float type. This mostly happens when bucket size is only 1.

{
        "_index": "trn-2024",
        "_id": "413000787792023-11-171",
        "_score": 3,
        "_source": {
          "transStatusDescription": "POSTED",
          "baiCodeDescriptionCustom": "Individual International Money Transfer Credit",
          "transactionDate_TS": 1700179200000
         "transactionRemarks_UI": "sdfs fss sdds",
          "amtCodeDescription": "Credit",
                 "transactionAmount": 9999994.74,
          "currency": "USD",
          "id": "413000787792023-11-171",
   
          "transactionAmountStr": "9999994.74",
          "transactionTypeDesc": "Transfer Credit",
          "transactionDate": "2023-11-17 00:00:00",

          "amountCode": "04",
          "accountId": "12200074779",
         
        }
      }
    ]
  }

Suppose there is only one above document in alias but sum/ stats API is returning as "9999995" in place of "transactionAmount": 9999994.74,

...... search criteria .....
..

"aggs": {
    "amtCode.keyword": {
      "aggregations": {
        "transactionAmountStats": {
          "stats": {
            "field": "transactionAmount"
          }
        },
        "transactionAmountSum": {
          "sum": {
            "field": "transactionAmount"
          }
        }
      },
      "terms": {
        "field": "amtCode.keyword",
        "show_term_doc_count_error": true
      }
    }
  }



Hi @Manoj_Upadhyay

Try this

DELETE my-index

PUT my-index
{
  "mappings": {
    "properties": {
      "transactionAmount" : { "type" : "double"} 
    }
  }
}

GET my-index
POST my-index/_doc
{
  "transStatusDescription": "POSTED",
  "baiCodeDescriptionCustom": "Individual International Money Transfer Credit",
  "transactionDate_TS": 1700179200000,
  "transactionRemarks_UI": "sdfs fss sdds",
  "amtCodeDescription": "Credit",
  "transactionAmount": 9999994.74,
  "currency": "USD",
  "id": "413000787792023-11-171",
  "transactionAmountStr": "9999994.74",
  "transactionTypeDesc": "Transfer Credit",
  "transactionDate": "2023-11-17 00:00:00",
  "amountCode": "04",
  "accountId": "12200074779"
}

GET my-index/_search
{
  "size": 0,
  "aggs": {
    "sum_txn": {
      "sum": {
        "field": "transactionAmount"
      }
    }
  }
}

# Results 
{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "sum_txn": {
      "value": 9999994.74
    }
  }
}

Keep in mind significant digits .....

Obviously using double is a solution. But @Manoj_Upadhyay has maybe not understood the problem?

Manoj: A float (in elasticsearch) is represented by 32 bits. This actually means the same 32 bit representation can represent a whole range of (real) numbers. In your case the bit representation of the exact number 9999994.74 is:

0 10010110 00110001001011001111011

The bit representation of exactly 9999995 is

0 10010110 00110001001011001111011

You will notice those bit representations are identical.

In fact all numbers from 9999994.51 to 9999995.49 will be stored as exactly that same bit pattern. And 9999995 is the number right in the middle of that "range".

If you need do something with the actual numbers (in many cases) this will be done with the bit representations, not the strings that you see in _source.

A double means the numbers are stored much more precisely, actually using 64 bits rather than just 32. So these are much more precise.

So any time you have numbers with more than around 6/7 digits, you should make sure to use the double data type to avoid issues like this.

2 Likes

Thanks for details. I am surprised why it happens mostly when bucket has "doc_count": 1, only one element , it works fine when bucket size is more than 1 .

Indices has too many documents ( around billion) then how I can easily re-index them.

reindex to convert float data type to double? I'd wonder if that is worth the effort if this issue is the only confusion it creates for you.