Where did my aggregation data go! The case of the missing bucket


(Zdrummond) #1

When I ask ES for an aggregation, by time, of the sum of a size by ID; I get a result where some events work correctly (end up in bucket with the correct term key), but other data, seemingly randomly, ends up with a blank term key

For example, with this as my aggs

 "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1m",
        "time_zone": "America/Los_Angeles",
        "min_doc_count": 1,
        "extended_bounds": {
          "min": 1452209723842,
          "max": 1452214986379
        }
      },
      "aggs": {
        "3": {
          "terms": {
            "field": "collection_id.raw",
            "size": 0,
            "order": {
              "1": "desc"
            }
          },
          "aggs": {
            "1": {
              "sum": {
                "field": "size"
              }
            }
          }
        }
      }
    }
  }

I get the the following result
...

 "aggregations": {
    "2": {
      "buckets": [
        {
          "3": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": []
          },
          "key_as_string": "2016-01-07T15:38:00.000-08:00",
          "key": 1452209880000,
          "doc_count": 1
        },
        {
          "3": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "1": {
                  "value": 140355866
                },
                "key": "19488",
                "doc_count": 43
              }
            ]
          },
          "key_as_string": "2016-01-07T16:59:00.000-08:00",
          "key": 1452214740000,
          "doc_count": 43
        },
        {
          "3": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "1": {
                  "value": 63037240
                },
                "key": "19488",
                "doc_count": 8
              }
            ]
          },
          "key_as_string": "2016-01-07T17:01:00.000-08:00",
          "key": 1452214860000,
          "doc_count": 8
        }
      ]
    }
  }

Note that I get three buckets. Two of them have a value in buckets, one of them does not.

If i dig into the Event data, I see nothing odd about them, Here is a snippet for an event that does get into a well defined bucket,

@timestamp	  	January 7th 2016, 16:59:32.000
t@version	  	1
t_id	  	AVIevu0_l2HWhQn5VZHu
t_index	  	logstash-2016.01.08
... 	
collection_id	  	19488

and one that does not. Note they have all the same key data.

@timestamp	  	January 7th 2016, 15:38:33.000
t@version	  	1
t_id	  	AVIedMH2l2HWhQn5U0QH
t_index	  	logstash-2016.01.07
...
collection_id	  	19456

No idea what is going on, and what makes some data special and others not so much.


(system) #2