Problems with SCORE on Anomaly Detection JOB

Rossana · July 17, 2023, 6:12pm

hi!

I have this behavior

The typical value was 18.3 and the Actual vale was 0.

The anomaly score is too low and I should had received an anomaly alert but the score was wrong, so the alert was not fired.

I also saw a cross mark and I dont know what that means?

Please help me to understand this behavior! and how to tune it!

valeriy42 · July 18, 2023, 12:09pm

Hello @Rossana ,

The cross mark means that the multi-bucket behavior is mainly responsible for the anomaly score. Please, read the blog post on Explaining anomalies detected by Elastic machine learning.

From the screenshot snippet, it's unclear what was the previous behavior in the data. However, the anomalies are preceded by a long period of 0 values, which were typical, and in the interval of interest, the difference between the actual value of 0 and the lower bound of the confidence interval apparently is not that large to motivate a higher score.

You can the *_low anomaly detection function if you are interested in drops
Consider setting an alert to a specific rule, if you are interested in being notified, e.g. if you receive 0 values after 7 am.
You can also set the anomaly score threshold to a lower value like 17 trigger alarm for such behavior.

Rossana · July 18, 2023, 3:25pm

Hi,

Thanks for your anwers. I read the link but still confused by this topic.

I had a production problem so I need to prevent this happen again because no alerts was shot. The same day we had a lot of errors as you can see in the next picture:

This image considere the 11 days before an the value was 42x higher and the severity was <1. WHY?

This is the same event but here there is only two days from now.

I understand that was a multibucket impact (still a bit confusing) but what happen with the single bucket impact? or the value actual and the typical? and why the score was <1 (what means <1)?

I feel lose in this topic!

valeriy42 · July 19, 2023, 8:05am

Hello @Rossana ,

I am sorry you feel confused about the scores. Here are some pointers to help you:

As a rule of thumb, an anomaly detector needs about 3 weeks of data to build a probabilistic model that describes the data. In your case, it appears that the data ingest started on 2023-07-11 and the anomalous behavior is observed only 6 days later. Therefore, you need to let the job run for a bit longer before trying to understand the "usual" score numbers that the anomaly detector would assign. Before seeing enough evidence (data), the anomaly detector would be reluctant to assign high anomaly scores, since the "typical" value is derived from an insufficient number of observations.
Anomaly detectors make sense on complex data with multiple seasonalities (e.g. different behavior over hour of the day, day of the week, month, etc.), trends, and so on. If you have data where you expect ~0 most of the time and want to be alerted when you get anything >100, then a simple alert rule may be more helpful.
There are many resources available online that dive deep into how anomalies are identified and scored.

Rossana · July 19, 2023, 1:44pm

Hi,

Thanks but I have a lot of data... More than 60 days:

As you can see in the image before my data has differents behaviors.
I receive errors evey day but I only want and alert when the amount of error are anomalous.

This happen last night.

Actual: 186
Typical: 3.56
Still have anomaly score <1.
Here there is no multi bucket impact

1-. Why this happen?
2-. Why <1 and why is the meaning of this score?

valeriy42 · July 20, 2023, 7:01am

Hi,

well, that's interesting... Can you please let us know what version of ES you are using? On the first screenshot, there is an annotation marked as [1] around 2023-07-02. Can you please tell me what this annotation is? It seems that before this annotation, the model score behaved as you would expect.

Also for the last spike can you please put the screenshot of anomaly score explanations (available if you click on the arrow in the anomaly table)?

Thank you.

Rossana · July 20, 2023, 6:17pm

Hi,

Here the information you ask for...

1-. Cluster Version:

2-. Annotation:

I dont know what it is? but if I click it, this is what it shows...

3-. Anomaly score explanation:

Thanks.

Rossana · July 25, 2023, 4:30pm

@valeriy42 can you help me?

valeriy42 · July 26, 2023, 10:54am

Hello @Rossana ,

Sorry for the delay. Removing all seasonalities at [1] doesn't seem right. I'll reach out to colleagues to investigate it further.

Does your data contain buckets with no entries? Please check if using high_non_zero_count instead of high_count leads to the same results.

Rossana · August 17, 2023, 12:21am

Hi,

Did you receive an answer form your collagues?

any news!

valeriy42 · August 17, 2023, 8:32am

Hello,

here are the results of our analysis:

You base how high you think the anomaly score should be on how different the value is from "typical" (this is the median prediction). Often the value is very different from typical, so by itself this is not necessarily a good indicator that something is anomalous because the data has a long positive tail.
Our score is trying to say whether these events are historically unusual. Looking at the history of the metric, this event does not look particularly unusual. However, one option you have is that if you continue to have problems, you can consider reverting to the snapshot and excluding the problem periods.
Perhaps this metric is not a good indicator of problems. Do you really have problems like this all the time (based on the screenshot)? Perhaps the metric you are analyzing for anomalies does not correlate well with problems in your system. Some metrics are better at detecting problems, like KPIs, and some can be useful for root cause and are too unpredictable to perform reliable anomaly detection for alerting.
If you like, you can create an alert based on what differs from typical, if you think this better detects the events you care about. A watch on the anomaly detection results would allow you to do this.
There's nothing that says you have to alert at a certain level, so you can create a rule that alerts at a lower score if that doesn't give you too many false positives.
The blow-up in bounds looks a bit like [ML] Improve robustness to outliers after detecting changes in time series by tveasey · Pull Request #2280 · elastic/ml-cpp · GitHub, this isn't in your version and might improve the behavior after reinitialization. However, I would still say that all the above observations apply.

system · September 14, 2023, 8:33am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multi-Bucket Scoring Machine Learning Elasticsearch elastic-stack-machine-learning	7	1094	July 18, 2019
Anomaly score is changed after few days Kibana elastic-stack-machine-learning	2	332	November 3, 2022
Detect anomalies for specified time Elasticsearch elastic-stack-machine-learning	2	409	July 26, 2019
Machine Learning module is triggering alerts when there is no anomaly Elasticsearch elastic-stack-machine-learning	27	2903	July 1, 2019
Anomaly Score < 1, despite a big difference between actual and typical detections Elasticsearch elastic-stack-machine-learning	2	513	April 12, 2022

Problems with SCORE on Anomaly Detection JOB

Related topics