Sum Aggregation returning very small, unrelated values

Jaimin_Mehta · August 5, 2015, 12:52pm

Hey,

I have been trying out the sum aggregation since quite a while. I have figured out a query that works perfectly fine for me otherwise, but sometimes, due to reasons unknown, returns ridiculously small values.

I have used the following mapping to index my data:

mapping = {
      "marksheet": {
        "date_detection": False,
        "dynamic_templates": [{
          "string_fields": {
            "mapping": {
              "type": "string",
              "fields": {
                "raw": {
                  "index": "not_analyzed",
                  "ignore_above": 256,
                  "type": "string"
                }
              }
            },
            "match_mapping_type": "string",
            "match": "*"
          }
        }]
      }
    }

The following is the query that I have run for the aggregation:

{ "aggs": { "student_name": { "terms": { "field": "name.raw","order": { "total_score" : "desc" } }, "aggs": { "total_score": { "sum": { "field": "score" } } } } } }

The "score" field only contains values between 1-10, but somehow my aggregation returns a value of 1.4e-322, 1.3e-322, etc.

I wonder why this is happening, and I haven't been able to figure out a reason for it as well. I would be very grateful if someone could help me out with this issue.

Thanks.

colings86 · August 5, 2015, 1:18pm

Try running the following query and seeing what the output is for the buckets with the very small values:

{
  "aggs": {
    "student_name": {
      "terms": {
        "field": "name.raw",
        "order": {
          "total_score": "desc"
        }
      },
      "aggs": {
        "total_score": {
          "sum": {
            "field": "score"
          }
        },
        "score_ranges": {
          "range": {
            "field": "score",
            "ranges": [
              {
                "to": 1
              },
              {
                "from": 1
              }
            ]
          },
          "aggs": {
            "top_hits": {
              "top_hits": {
                "size": 10
              }
            }
          }
        }
      }
    }
  }
}

Basically this will add a range aggregation alongside your sum aggregation which will put documents with a score value <1 into one bucket and ones with score >=1 in another. Then for each of these buckets it outputs the top 10 documents. Hopefully this will show if any of the documents contain unexpected values (values outside of the 1-10 range).

Another thing to do would be to check the resulting mappings in your index by running:

curl -XGET "http://localhost:9200/my_index/_mapping"

If you can paste the result of that into a gist and link it here it will give us an idea of what the mapping currently is on that index for all relevant fields (please do not paste a large mapping directly in a reply as it makes the post very hard to read).

Jaimin_Mehta · August 6, 2015, 5:40am

Hi Colin,

I did run the above aggregation. Also the curl command, I ran specifically for my doc_type, as follows:

curl -XGET "http://localhost:9200/my_index/_mapping/marksheet"

The output for both, I have pasted into a gist. Here's the link:

gist.github.com

https://gist.github.com/jaimin-innoplexus/846ba06eec32f7381d67#file-mapping-json

Aggregation Result.json

{
    "student_name": {
        "buckets": [
            {
                "total_score": {
                    "value": 1.4e-322
                },
                "score_ranges": {
                    "buckets": [
                        {

This file has been truncated. show original

Mapping.json

{
    "my_index": {
        "mappings": {
            "marksheet": {
                "dynamic_templates": [
                    {
                        "string_fields": {
                            "mapping": {
                                "fields": {
                                    "raw": {

This file has been truncated. show original

jpountz · August 6, 2015, 10:21am

Unfortunately this is a known issue with dynamic mappings on 1.x. If two of your shards get a document at the same time, and one is mapped as a long and the other one as a double, then the master node will reject the 2nd mapping which is applied, but the shard with the wrong mapping will continue to index documents using the wrong type. Usually, the problem becomes visible when nodes are moved around, because then elasticsearch starts either interpreting some double bits as longs (and you would see huge number) or long bits as double (and you would typically see tiny numbers, like here). When hit by this bug, there is not other choice but to reindex. One way to prevent it from happening again in the future would be to configure mappings explicitely.

On 2.0, this issue will be fixed as dynamic mappings will have to be validated on the master node first before being applied. You can look at https://github.com/elastic/elasticsearch/pull/10634 for more information.

Jaimin_Mehta · August 6, 2015, 2:03pm

Thanks a lot Adrien! That was a really good description and really very helpful. Yes, you're right, on other doc-types that I have, I have seen the really large numbers as well.
Well, I guess I'd be the happiest if this issue is fixed in 2.0, since in a few doc_types and other indexes, I've got really a large number of keys in the document, for which configuring mappings explicitly would be pretty difficult. Though I will try and figure out some way of mapping it explicitly. Thanks a lot!

Topic		Replies	Views
Incorrect sum while aggregating in elasticsearch for one particular index Elasticsearch	6	3971	July 5, 2017
Weird aggregation issue Elasticsearch	3	1801	July 5, 2017
Aggregating using sum always yields me zero even when there is a value associated to it Kibana	4	735	July 6, 2017
Issue with stats aggregation Elasticsearch	3	426	July 6, 2017
Aggregation always returns 0 with values smaller than .1 Elasticsearch	4	2707	July 5, 2017

Sum Aggregation returning very small, unrelated values

Related topics