Metrics and Time-series Analysis Megathread

(Zachary Tong) #1

With the release of Elasticsearch 2.0.0-beta1, a new aggregation feature is now available for testing: Pipeline Aggregations.

Pipeline aggregations are a new type of agg that can be used to perform processing on aggregation results. For example, you may use bucket/metric aggs to calculate the maximum price-per-day, and then a pipeline agg to find the average of those max prices.

Importantly, these new aggs introduce powerful analytics for ordered, sequential time-series and metrics. For example, you can use a date_histogram and apply a derivative or moving average.

This opens up a lot of possibilities for time-series analysis which were previously difficult or impossible, simply because bucket/metric aggregations cannot operate on sequentially ordered data (due to the distributed nature of that data).

There is a lot of work going into other areas which benefit metric data too: doc values are essentially a serialized column store, improvements in numeric compression and sparse bitsets in Lucene, etc

Purpose of this thread
I'd like to use this thread as the central location to discuss pipeline aggs, feature requests, questions, comments, or really anything related to metrics and time-series analysis in Elasticsearch.

In particular, if you have a time-series use case which cannot be built with pipeline aggs yet, let us know! It is early days for pipeline aggs, and we have a lot of ideas moving forward. User input is critical so we build functionality that is actually useful to you.

Resources and Links

(Patrick) #2

Glad I found this before I started another thread even though this is old. Ive written some queries to calculate moving averages on a time series i have, and I'd like to generate some graphs to visualize the results. How were the graphs in "staying in control with moving averages" generated? I feel like the simple answer is 'kibana', but since kibana doesnt support some of these aggregations im wondering how the graphs were made.

Did the authors execute their query, then index the results back into elasticsearch somehow? If so, is there an easy way to go from "I have some query results" to, "my query results are in elasticsearch so now I can graph them" Any guidance is appreciated, will wait a while before I ask this in a new thread.

*Edit - I now recognize that you wrote the articles im talking about. Have also read "building a statistical anomoly detector" - How are you generating the plots in your articles @polyfractal?

(Zachary Tong) #3

Heya :slight_smile:

The graphs in the "staying in control" series were actually generated with an internal tool that we used to prototype pipeline aggs, because there wasn't support anywhere at the time. Unfortunately, it seems Kibana still doesn't support pipelines, so the options are still limited.

I think Timelion is probably the best bet at this point. That's what I used in the "statistical anomaly detector" series. It's fairly easy to add custom functions to Timelion, so you could add a custom function that executes a moving average pipeline for example. It'd be much simpler than trying to get Kibana proper to cooperate (note that Timelion supports some basic operations like moving_average already, although it does it client-side instead of on the cluster)

It's not a wonderful solution and takes some work, but the basic es() function in Timelion is pretty simple to copy/extend

(system) #4