Sum Aggregation returning very small, unrelated values


(Jaimin Mehta) #1

Hey,

I have been trying out the sum aggregation since quite a while. I have figured out a query that works perfectly fine for me otherwise, but sometimes, due to reasons unknown, returns ridiculously small values.

I have used the following mapping to index my data:

mapping = {
      "marksheet": {
        "date_detection": False,
        "dynamic_templates": [{
          "string_fields": {
            "mapping": {
              "type": "string",
              "fields": {
                "raw": {
                  "index": "not_analyzed",
                  "ignore_above": 256,
                  "type": "string"
                }
              }
            },
            "match_mapping_type": "string",
            "match": "*"
          }
        }]
      }
    }

The following is the query that I have run for the aggregation:

{ "aggs": { "student_name": { "terms": { "field": "name.raw","order": { "total_score" : "desc" } }, "aggs": { "total_score": { "sum": { "field": "score" } } } } } }

The "score" field only contains values between 1-10, but somehow my aggregation returns a value of 1.4e-322, 1.3e-322, etc.

I wonder why this is happening, and I haven't been able to figure out a reason for it as well. I would be very grateful if someone could help me out with this issue.

Thanks.


(Colin Goodheart-Smithe) #2

Try running the following query and seeing what the output is for the buckets with the very small values:

{
  "aggs": {
    "student_name": {
      "terms": {
        "field": "name.raw",
        "order": {
          "total_score": "desc"
        }
      },
      "aggs": {
        "total_score": {
          "sum": {
            "field": "score"
          }
        },
        "score_ranges": {
          "range": {
            "field": "score",
            "ranges": [
              {
                "to": 1
              },
              {
                "from": 1
              }
            ]
          },
          "aggs": {
            "top_hits": {
              "top_hits": {
                "size": 10
              }
            }
          }
        }
      }
    }
  }
}

Basically this will add a range aggregation alongside your sum aggregation which will put documents with a score value <1 into one bucket and ones with score >=1 in another. Then for each of these buckets it outputs the top 10 documents. Hopefully this will show if any of the documents contain unexpected values (values outside of the 1-10 range).

Another thing to do would be to check the resulting mappings in your index by running:

curl -XGET "http://localhost:9200/my_index/_mapping"

If you can paste the result of that into a gist and link it here it will give us an idea of what the mapping currently is on that index for all relevant fields (please do not paste a large mapping directly in a reply as it makes the post very hard to read).


(Jaimin Mehta) #3

Hi Colin,

I did run the above aggregation. Also the curl command, I ran specifically for my doc_type, as follows:

curl -XGET "http://localhost:9200/my_index/_mapping/marksheet"

The output for both, I have pasted into a gist. Here's the link:


Doc['field_name'] returns unexpected big number
Incorrect sum while aggregating in elasticsearch for one particular index
Ideal setup of Elasticsearch
(Adrien Grand) #4

Unfortunately this is a known issue with dynamic mappings on 1.x. If two of your shards get a document at the same time, and one is mapped as a long and the other one as a double, then the master node will reject the 2nd mapping which is applied, but the shard with the wrong mapping will continue to index documents using the wrong type. Usually, the problem becomes visible when nodes are moved around, because then elasticsearch starts either interpreting some double bits as longs (and you would see huge number) or long bits as double (and you would typically see tiny numbers, like here). When hit by this bug, there is not other choice but to reindex. One way to prevent it from happening again in the future would be to configure mappings explicitely.

On 2.0, this issue will be fixed as dynamic mappings will have to be validated on the master node first before being applied. You can look at https://github.com/elastic/elasticsearch/pull/10634 for more information.


Weird aggregation issue
Inconsistent field mapping -- sometimes string, sometimes long
(Jaimin Mehta) #5

Thanks a lot Adrien! That was a really good description and really very helpful. Yes, you're right, on other doc-types that I have, I have seen the really large numbers as well.
Well, I guess I'd be the happiest if this issue is fixed in 2.0, since in a few doc_types and other indexes, I've got really a large number of keys in the document, for which configuring mappings explicitly would be pretty difficult. Though I will try and figure out some way of mapping it explicitly. Thanks a lot! :smile:


(system) #6