I have a sample server data for which I'm trying to use the machine learning pack and calculate anomalies in it. I have fed the data into database and created a single metric job to perform the analysis. The job was created and executed successfully. The result is as below:
I expected the value at August 28th to be declared as major critical anomaly as the description says it is 4x times higher. Instead, the value at September 6th is declared critical anomaly. Could anyone explain how the anomaly score is being calculated? I need to know how the value for September 6th is assigned 94 and value for August 28th is assigned 60. Also, I understand the value with low probability will be tagged as anomaly. Could anyone explain how this probability is calculated and what is the significance of it?
The anomaly score (between 0 and 100) is a two-step calculation. First, the probability of the observation is calculated (shown as the
probability field) - then a secondary normalization calculation is done in which all past anomalies for that job are essentially ranked against each other in a quantile analysis in order to get a relative severity score. The anomaly score is not really related (math-wise) to the informational message seen in the UI (i.e. the "3x higher") text. That's simply there for context for the viewer.
In your particular case, the Sept 6th's observation has a lower probability value, thus a higher anomaly score.
Another thing to keep in mind is that the probability calculation also takes into account how much data has been seen up until that point in time. The Sept 6th observation has the benefit of more historical data and thus the internal probability model is different (meaning, more mature) than the model was when the Aug 28th data was seen by the ML job.
Hope that helps.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.