Suppose I have a date/time field for each document. Now suppose that I group my documents into hourly buckets using a date_histogram aggregation. Now suppose that there are some hours with no documents. So the document count for a 12 hour period might look like this:
What I want to do is find the average number of consecutive buckets with non-zero counts. So here, we have the first cluster of 2 consecutive hours with non-zero counts, followed by hour 3 with 0 count, then 3 consecutive buckets of non-zero, followed by two buckets with 0, and finally 4 consecutive buckets of non-zero.
So in this example, what I want is the avg(2, 3, 4) = 3. Is there a clever way to do this in ES using some combination of aggregations?
Would it not be more efficient from a data transfer over the network perspective if I were only retrieving an average value as opposed to all the bucket counts? Especially if I were doing this over many queries?
If your client application and the Elasticsearch cluster are on the same local network, this wouldn't be an issue. Even on a remote network I don't think this would be an issue unless the date histogram would loooots of buckets.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.