I would like to do a count of the amount of log entries over a certain time window, let's say one hour buckets. So log entries from monday 3-4pm should be counted with those from tuesday 3-4pm etc.
I have a timestamp field containing both the date and time. Would I need to split that field into a separate date and time field first and then perform the aggregation, or is there a clever way to do it without splitting the source data?
I'm not sure if I would have an ideal solution, but you can try to perform a cumulative sum per hour, something like: cumulative_sum(count(), shift='1h') . In this case, you would have the cumulative sum per hour for each day.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.