Hi Guys, my index is a set of messages that have a timestamp. I want to calculate the average number of messages each hour for the past month. So the output would be something like:
1 50
2 75
3 68
...
23 100
24 98
It looks as if some of the new features in ES 2.0 might enable this, but while I can get a doc count of the number of messages each hour using a date histogram, I just cannot work out how to then summarise it by hour essentially ignoring the date.
OK. Is it possible to do this using 2.0 RC1? I've installed it bit I still can't get my head around how to set up the aggregations. Any suggestions as to how to structure them would be greatly appreciated!
Whilst this is something that will be possible with a Pipeline Aggregation, unfortunately there is not currently one written that will do what you are after. Essentially you would need a pipeline aggregation which uses a date histogram as an input and outputs a new aggregation which has a bucket for each hour in the day. The aggregation would then re-bucket the data from the histogram aggregation into this new aggregation.
Just stumbled across this post as I'm trying to do something very similar. I have used a date histogram with a script in order to to get a per second hit count for log entries. There is one bucket per second for 24 hours, 86400 buckets! What I want to do now is produce a reduced histogram from this parent histogram, where each bucket represents 5 minutes and will contain the average or max value of doc_count from corresponding the parent buckets. Is there any way to achieve this yet?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.