But i still confused about the algorithm using in background.
why there are some empty squares even when i looked into the data i found values..
for example the second line
analyse the data Throughout the month of January as you see in the last picture the 9 first days are empty (no results ) but when i check if there are data during these days using line chart graph
NB: i used an average of a metric for all the graphs
The algorithms are a mixture of techniques, including:
Clustering
Various types of time series decomposition
Bayesian distribution modeling
Correlation analysis
The "empty" squares do not mean there was no data to be analyzed. It means that the algorithms determined that the data wasn't anomalous enough at that moment in time to give it a non-zero score.
To understand better what the scoring means, please refer to this blog:
Thank you @richcollier for you answer ! I have a look at your blog , it is very interesting and useful
But, i can't understand why some values are considered anomalies and other not even when i compare the two anomaly scores , there is a clear anomaly
here an exemple:
the graph show an anomaly the February 1st 2018 at 10:00 am but there are many others anomalies (with a higher value than the red anomaly) but the color is just blue
You need to think about how this data is presented to ML - in chronological order. The first time the big spike is seen, it is flagged as a red anomaly because it's the "worst thing" that's been seen so far.
However, as time goes on ML sees that spikes of that magnitude are actually not that unexpected. They happen quite frequently after all. Therefore, ML chooses to score those subsequent anomalies with a lower score (since their probability of occurring gets higher and higher).
@richcollier,
thank you again for your support! i really appreciate your effort , you help me a lot from the beginning of my project
Well , i understand and it is very logical. In your blog you said that "the plugin construct a baseline probability model based on the observed past behavior" which confirm exactly what you explain here...
In the Bayesian distribution modeling, Which type of distribution you admet (Normal, Poisson,Binomial...)?
And in the Clustering which alogorithm is used(kmeans, Mean-Shift ,DBSCAN...)?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.