Date Histogram -- Aggregate By Hour?


I'm running ES version 2.4.0. I would like to aggregate search results by hour. I'm running the same query over two time different time ranges. One time range has hourly indexes and the other has daily indexes.

I get back the results I'd expect only when the data is broken down into hourly indexes. For data that is broken into daily indexes, the query results show a bucket for a specific hour but it only has the hour for the last hit received that day. So if I have one hit in hours 1, 2, and 3, it shows a bucket for hour 3 with 3 hits, while the hourly indexes show buckets for hours 1, 2, and 3 with 1 hit in each bucket (this is what I was aiming for). In order to achieve the hourly buckets does the data need to be broken into hourly indexes?

Overall what is the best way to aggregate the data by hour without having indexes per hour? In other posts I've read it seems like the best approach is to create a new field to store the hour and then aggregate on this new field. Also let me know if you want me to paste in the query.

Ok, I think I figured out my issue here. So we are using Logstash to index our data. The problem was related to how the document id was generated and the corresponding impact when using hourly vs. daily indexes.

At index time, we are generating document id's using event data (x and y) using the log stash config:

                    ruby {
                       code => "require 'digest/md5'; event['@metadata']['computed_id'] = Digest::MD5.hexdigest(event['x'] + event['y'])"

So each hour we get some number of documents grouped by x and y, and then later a new hourly index is generated. The same goes for daily indexes except it's only a daily basis (not hourly). For the daily indexes I was overlooking the fact that document id's were generated on a daily basis.

So overall, I think for the daily case I need to generate document id's according to x, y and hour.

                    ruby {
                       code => "require 'digest/md5'; event['@metadata']['computed_id'] = Digest::MD5.hexdigest(event['x'] + event['y'] + event[hour])"

This allows for aggregating by hour with daily indexes.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.