I've recently started using and enjoying ES, in particular I'm keen to
exploit the new aggregations feature to report on system metrics data that
is currently being fed into ES indexes.
I'm experimenting with aggregations that fold up things like request rates
per machine or API calls (per machine, globally, etc). I was thinking that
it might be useful to store the aggregation result itself, particularly if
I set a (let's say) weekly TTL on the incoming metrics data but would like
to preserve historical aggregates (e.g. find me the average/min/max request
rate on day 17). I might want to keep the raw metrics for a week, but the
aggregates should potentially stick around for years.
Are there any recommendations/patterns in regards to dealing with these
scenarios? Are there existing means for recomputing aggregates at regular
intervals and emitting those back into ES?
To clarify, these questions are coming from my desire to dynamically
produce real time aggregated information from a "stream", which in this
case is metric data we're feeding to ES. I'm concerned about unnecessary
re-execution of aggregations on (potentially large) data sets that could be
computed more efficiently by maintaining buckets that are simply updated as
data enters ES. I'm not sure if there is a good pattern for this or if I'm
better off using a different technology entirely (e.g. Storm, etc), though
it is nice having all my logs/metrics queryable from one place.
On Thursday, June 5, 2014 12:51:52 PM UTC-4, erewh0n wrote:
I've recently started using and enjoying ES, in particular I'm keen to
exploit the new aggregations feature to report on system metrics data that
is currently being fed into ES indexes.
I'm experimenting with aggregations that fold up things like request rates
per machine or API calls (per machine, globally, etc). I was thinking that
it might be useful to store the aggregation result itself, particularly if
I set a (let's say) weekly TTL on the incoming metrics data but would like
to preserve historical aggregates (e.g. find me the average/min/max request
rate on day 17). I might want to keep the raw metrics for a week, but the
aggregates should potentially stick around for years.
Are there any recommendations/patterns in regards to dealing with these
scenarios? Are there existing means for recomputing aggregates at regular
intervals and emitting those back into ES?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.