Using aggregation in anomaly detection jobs

one · August 23, 2021, 1:56pm

I guess I did not understand the use of aggregation in anomaly detection. I was expecting that the same aggregation with different time intervals will not affect the results, however, it seems that this is not the case.

I created 3 anomaly detection jobs using aggregation as discussed here each of which running bucket spans of 6hr.

For each job, I created Datafeed with 3 different fixed_intervals: 1hr, 3hr, 6hr. Note that the bucket span is divisible by all these intervals.

I was expecting the same results for each job, however, even though I get exactly the same time series chart for the jobs, i.e., the actual values of the buckets are the same for the jobs, the anomaly scores for the anomalies are different. In some cases, some buckets are not considered as an anomaly when other job marks it as an anomaly.

Any clarification would be greatly appreciated.

Tom_Veasey · August 25, 2021, 9:06am

Right first of all a little background on aggregation. If you group together metric values to form some sort of statistic (say the mean of those values) and if you assume that the individual values all come from the same distribution, then how the statistic is distributed usually depends on the count of values you have aggregated. You can test this out for yourself for example by charting the mean of noisy data in Kibana with different aggregation intervals. What you typically see is that as you choose longer bucket lengths (i.e. more values are averaged together) the chart becomes smoother. (In fact, if everything is independent the variance of the mean drops like 1 / "number of samples".)

For our anomaly detection when we try to model the distribution of, for example, the mean, what we ideally want is (aside from time dependent behaviour like seasonality) that the means come from the same distribution. However, if we fix the time interval we aggregate on, the count of values in each mean statistic will typically vary and so, by construction, we break the assumption we make that they come from the same distribution. To avoid this, rather than adding each time bucket mean to the model, we estimate how many values we get on average per time bucket interval and then always group together approximately this many values and compute their mean. When the rate is high we can add several samples per bucket when the rate is low we wait until enough values have arrived. The finer grained the sub-bucketing in the pre-aggregated data the more accurately we can achieve this. Indeed, if you just scroll the data rather than pre-aggregate we always use identical sample counts for all statistic values we add to the model. So this choice alters the values the model will actually see and hence alters its predictions and the anomalies it generates.

We still assign anomaly scores to time buckets and so the charts we display are all unchanged. However, when we then come to assess how unusual the time bucket is we ask ourselves what is the value and also what is the count of values in the bucket (so how do we expect the distribution to change).

A couple more notes:

This only affects some metrics, i.e. things like mean, median. Things like the count and sum should produce identical results for pre-aggregated data.
We typically recommend using around 1/10 the bucket interval for the pre-aggregation interval as a good tradeoff between performance and accuracy.
It is on the roadmap (and indeed we have a prototype) to autogenerate aggregations for you in the data feed API which should deal with all these issues for you automatically.

system · September 22, 2021, 9:07am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Variation in analysis results using different aggregation values Elasticsearch	3	609	December 19, 2017
How to configure ANOMALY DETECTION with DAILY buckets Kibana elastic-stack-machine-learning	5	837	March 2, 2020
Bucket span in Job management Kibana elastic-stack-machine-learning	5	591	May 9, 2019
Aggregation interval: 1m, bucket span: 1m Kibana	4	302	November 3, 2020
How does Anomaly Detection work? Elasticsearch elastic-stack-machine-learning	2	590	March 17, 2023

Using aggregation in anomaly detection jobs

Related topics