Im monitoring all traffic that comes from my Big Ip F5 Load Balancer. I have an index with all this information and a ml job with a count function and a detector using two fields (virtual_ip.keyword over http_path.keyword). Now I want to create a watcher for a specific http_path value. Its possible?
Hi - yes, of course you can create a Watch for any ML job - it only requires a little knowledge of how watcher works and how to get the information about the anomalies out of either the ML results API or via querying of the .ml-anomalies-* indices.
If you're not familiar with how Watcher works with ML you can first take a look at this blog. Although it is a little outdated, the fundamentals are still relevant.
Additionally, it is also good to understand how anomalies are scored and their different flavors (bucket, record, influencer). For info on that, please consult this blog.
I don't quite understand your specific use case and what you're looking for. Perhaps you can more succinctly describe what you want to accomplish and what your current ML job config is.
I want to create a watcher that sends an email if "http_path": "my_path/mypath.aspx" has less than X requests. For example, based on my ml job results, if "my_path/mypath.aspx" has 2500 requests per minute from 9:00 am to 17:30 pm and at 15:00 this value is minor, the watcher must send an email.
Be careful describing the use case using "over" as that implies that you are using Population Analysis instead of Temporal (https://www.elastic.co/blog/temporal-vs-population-analysis-in-elastic-machine-learning). In Population Analysis you'd be comparing the count of every http_path against other http_paths . In Termporal, you'd be comparing a particular count of each and every http_path against its own history.
If you want to do Temporal, you need to be careful of the cardinality of the http_path field. If it is in the tens of thousands or more, you may need to adjust the model_memory_limit.
When i was modeling the ml job, i thought using Population Analysis was the better way to do what i wanted. If i use a temporal Analysis using http_path, the cardinality was huge, but if i use population using virtual_ip over http_path, i got all http_path (more than 100) group by virtual_ip (only 5). I read on that link its possible to filter the Datafeed, so maybe this way is good to apply a watcher only for the http_path value i want to monitor. What do you think?
Population Analysis solves the cardinality problem, but it isn't the right approach when the "population" isn't homogeneous. So, if there are some http_paths that are much more popular then others (i.e. they routinely get more hits) - then, relative to their peers, those will always be labeled as unusual, despite consistent behavior with respect to themselves.
In temporal analysis, if you don't want to include every single http_path in the analysis, you could filter the datafeed, or you could use a custom aggregation (like a terms aggregation with a size option ) to only return the "Top N" most popular http_paths and analyze those.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.