I created a job in machine learning anomaly detection elastic, I used the sum function, I looked at the results in the single metric viewer and I saw some points as follows:
- On what basis to calculate upper and lower bounds?
- I have also read blogs talking about multi_bucket_impact due to one bucket and multiple buckets, but I still don't understand why it is like that
Can I set an average to identify outliers? If so, how?
The upper and lower bounds show the 95% confidence interval of the underlying probabilistic model.
Regarding mutli_bucket_impact, please formulate a specific question. It is difficult to answer "I don't understand why it is like that" question on Discuss.
Regarding setting the average to identify outliers, please be more specific. What average do you want to set? You can use alerting rules to specify deterministic rules when you want to be notified about deviation from a threshold.
Hi @valeriy42 ,
Thank you for responding
Regarding the lower and upper bounds, I don't know what criteria I am using to determine those two limits. I would like to understand clearly how you define the lower and upper bounds.
You say: "The upper and lower bounds show the 95% confidence interval of the underlying probabilistic model." I understand this idea, but what I want to ask is: How do you determine the limits of the model.
Regarding multi_bucket_impact and averaging outliers, I will create a separate page for discussion
The confidence interval is a generally established statistical concept. If one has a probabilistic distribution and can compute quantiles, it is straightforward to calculate the 5th and 95th percentiles.
This is how we also would determine the confidence interval.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.