Unclear aggregation results - version 5.3


(liorg2) #1

can anyone please explain why I'm getting these 2 buckets when running the given query?


(James Addison) #2

If I were to guess, I would say it is timezone-related. Try changing your query to include Z at the end (or otherwise make your range query use UTC timezone).


(liorg2) #3

your guess makes sense to me as well, but this gives the same result:

"range": {
  "date_time": {
    "gte": "2018-09-10T00:00:00.000Z",
    "lte": "2018-09-10T05:00:00.000Z"
  }
}

(Jimferenczi) #4

The intervals for the date_histogram starts at epoch time (1970-01-01T00:00:00) then for each date found in the documents we apply (time_in_millis / interval_in_millis) * interval_in_millis. By default all dates are considered UTC so 2018-09-10T00:00:00 falls into the 2018-09-09T23:00:00 bucket. You can use "offset": "+1h" in your date_histogram if you want to return buckets that start exactly at your range query boundaries but this would only work for this specific day. The range query filters documents but it doesn't change the way buckets are computed in the date_histogram.


(Colin Goodheart-Smithe) #5

The date histogram aggregation (or any aggregation for that matter) does not base its bucket boundaries on the details of the query. This would end up being too complex as the bounds of a query could be buried deep within a complex query with many nested levels. Instead the aggregation is fed documents that match the query and evaluates them independently of the query. This allows use to keep aggregations easily composable and nestable without the risk that a small change to a query or a parent aggregationwill break other aggregations.

The boundaries for buckets for the date histogram aggregation when using an interval like 5h are calculated by taking the epoch_millis value of the date (so 2018-09-10T00:00:00 has an epoch_millis value of 1536537600000) and dividing it by the number of milliseconds in the intervals (assuming a fixed-duration interval rather than a calendar interval, in this case the interval millis for 5h are 1000 * 60 * 60 * 5 = 18000000), rounding down to the nearest integer (1536537600000 / 18000000 = 85363.2 which rounds down to 85363) and then multiplying that value by the interval again to obtain the bucket boundary (85363 * 18000000 = 1536534000000 which is 2018-09-09T23:00:00).


(liorg2) #6

thanks so much
so the data of the '2018-09-09T23:00:00' bucket,with 5h interval summarize the period 2018-09-09T23:00:00 to 2018-09-10T04:00:00 ?


(liorg2) #7

and is there any option to achieve the requested query?


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.