Group/count/average consecutive buckets

Suppose I have a date/time field for each document. Now suppose that I group my documents into hourly buckets using a date_histogram aggregation. Now suppose that there are some hours with no documents. So the document count for a 12 hour period might look like this:

Hour 1: 12
2: 10
3: 0
4: 5
5: 90
6: 11
7: 0
8: 0
9: 56
10: 88
11: 33
12: 111

What I want to do is find the average number of consecutive buckets with non-zero counts. So here, we have the first cluster of 2 consecutive hours with non-zero counts, followed by hour 3 with 0 count, then 3 consecutive buckets of non-zero, followed by two buckets with 0, and finally 4 consecutive buckets of non-zero.

So in this example, what I want is the avg(2, 3, 4) = 3. Is there a clever way to do this in ES using some combination of aggregations?

Not that I know of. But doing this on client-side looks easy? For the record, doing it on the Elasticsearch side wouldn't be more efficient.

Would it not be more efficient from a data transfer over the network perspective if I were only retrieving an average value as opposed to all the bucket counts? Especially if I were doing this over many queries?

If your client application and the Elasticsearch cluster are on the same local network, this wouldn't be an issue. Even on a remote network I don't think this would be an issue unless the date histogram would loooots of buckets.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.