When I ask ES for an aggregation, by time, of the sum of a size by ID; I get a result where some events work correctly (end up in bucket with the correct term key), but other data, seemingly randomly, ends up with a blank term key
For example, with this as my aggs
"aggs": {
"2": {
"date_histogram": {
"field": "@timestamp",
"interval": "1m",
"time_zone": "America/Los_Angeles",
"min_doc_count": 1,
"extended_bounds": {
"min": 1452209723842,
"max": 1452214986379
}
},
"aggs": {
"3": {
"terms": {
"field": "collection_id.raw",
"size": 0,
"order": {
"1": "desc"
}
},
"aggs": {
"1": {
"sum": {
"field": "size"
}
}
}
}
}
}
}
I get the the following result
...
"aggregations": {
"2": {
"buckets": [
{
"3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": []
},
"key_as_string": "2016-01-07T15:38:00.000-08:00",
"key": 1452209880000,
"doc_count": 1
},
{
"3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"1": {
"value": 140355866
},
"key": "19488",
"doc_count": 43
}
]
},
"key_as_string": "2016-01-07T16:59:00.000-08:00",
"key": 1452214740000,
"doc_count": 43
},
{
"3": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"1": {
"value": 63037240
},
"key": "19488",
"doc_count": 8
}
]
},
"key_as_string": "2016-01-07T17:01:00.000-08:00",
"key": 1452214860000,
"doc_count": 8
}
]
}
}
Note that I get three buckets. Two of them have a value in buckets
, one of them does not.
If i dig into the Event data, I see nothing odd about them, Here is a snippet for an event that does get into a well defined bucket,
@timestamp January 7th 2016, 16:59:32.000
t@version 1
t_id AVIevu0_l2HWhQn5VZHu
t_index logstash-2016.01.08
...
collection_id 19488
and one that does not. Note they have all the same key data.
@timestamp January 7th 2016, 15:38:33.000
t@version 1
t_id AVIedMH2l2HWhQn5U0QH
t_index logstash-2016.01.07
...
collection_id 19456
No idea what is going on, and what makes some data special and others not so much.