I think this has been asked before by somebody but without answers. Basically the question is how to define a watch that can detect delta change anomalies in indices. Say, the frequency of document additions changes from 10doc/s to 30doc/s (a spike). An naive solution would require 2 calls and save states from the first call for computation in the second call, and this may not be feasible in the context of watch execution.
Just found out there was this date based aggregation. Can we use this to implement change detections. For example, breaking up the past 5 minutes into 5 buckets, and if the latest one is significantly larger or smaller than the avg of the last 4 (or something else), then raise a spike alert?
In your query, you could do a date aggregation, and use a scripted condition to iterate over the results and decide whether the actions should be run.
In ES 2.0 / Watcher 2.0, we have a few new aggregations that will make this easier. One of these new aggregations, Derivatives, would make it easy to calculate the change between the date histogram buckets, so your script would just need to check the derivative value and see if it were above/below your change threshold.
In the Watcher docs, we show how to use scripted conditions. If you want to see an example, the Marvel memory usage watch shows a sophisticated way to use scripting in the condition.
Cool. I coded one watch with hourly buckets for a specific date and seems I can get the histogram counts properly. Now I have issues referencing metrics within each buckets using mustache syntax. the following line will generate empty value when the buckets variable is referred to.
"actions": {
"log": {
"logging": {
"text": "There were {{ctx.payload.aggregations.docs_over_time.buckets[-1].doc_count}} docs at {{ctx.execution_time}}"
}
}
}
any pointers on how to use mustache syntax to refer to json object values?
So the response includes a bunch of buckets in the aggregation with doc_count metric in each. The output should print the actual number of docs in the referred bucket. But I only got the following:
The mustache syntax doesn't work quite the same as the groovy syntax you use in the condition. What I would do here would be to add a [transform stage][1] to your action and use groovy to identify the value you want and store it in the context, so you can easily reference the value from the mustache template.
If you always know which array item you want to reference, I believe this is proper mustache syntax for referring to the 4th item in the array:
ctx.payload.aggregations.docs_over_time.buckets.4.doc_count
Thanks. I realized that mustache had a different syntax but did not find any useful pointers, but I actually found out ctx.vars can be used to achieve the purposes.
watcher is indeed a very powerful feature, the only concern now is that the current python binding is very primitive. Is there any plan on a dsl based watcher lib in python any time soon?
There isn't much in the watcher python client library you mentioned. I was looking for something similar to elasticsearch-dsl-py, but also works for watcher apis.
If you are open to a 3rd party (commercial) plugin for the entire Elastic stack... My company, Prelert, makes this plugin that specifically does anomaly detection (download a free trial here).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.