Looks like the histogram aggregation works currently where each bucket is
based on a user defined time interval, therefore you do not know how many
buckets you will be returned unless you know ahead of time how much time
your data spans.
Is there a way to create a date histogram aggregation in ElasticSearch
where I can define the number of buckets that I want and the bucket
increment be determined during execution based on the oldest and newest
matching document?
I would like to be able to say give me 20 buckets. Thus if the data spans
10 years each bucket is determined a half year, or if it spans 10 minutes
each bucket determined to be 30 seconds.
right now the interval needs to be defined on query time and is never
dynamically calculated. The main reason is, that for each new document, you
probably have to expand the buckets or you have to have a first to
predetermine the oldest and youngest date of the dataset, which requires an
additional roundtrip across the cluster (also the bucket expansion would
need to occur cross cluster, which sounds like quite an performance impact,
but I am not an aggregregations expert).
--Alex
On Tue, Jun 24, 2014 at 8:07 PM, Tebring Daly tdalytx@gmail.com wrote:
Looks like the histogram aggregation works currently where each bucket is
based on a user defined time interval, therefore you do not know how many
buckets you will be returned unless you know ahead of time how much time
your data spans.
Is there a way to create a date histogram aggregation in Elasticsearch
where I can define the number of buckets that I want and the bucket
increment be determined during execution based on the oldest and newest
matching document?
I would like to be able to say give me 20 buckets. Thus if the data spans
10 years each bucket is determined a half year, or if it spans 10 minutes
each bucket determined to be 30 seconds.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.