I'm using kibana 6.2.2 and elastic 6.2.2 for my final academic project.
I prepared some analyses using ml plugin under x-pack like this
But i still confused about the algorithm using in background.
why there are some empty squares even when i looked into the data i found values..
for example the second line
analyse the data Throughout the month of January as you see in the last picture the 9 first days are empty (no results ) but when i check if there are data during these days using line chart graph
NB: i used an average of a metric for all the graphs
The algorithms are a mixture of techniques, including:
- Various types of time series decomposition
- Bayesian distribution modeling
- Correlation analysis
The "empty" squares do not mean there was no data to be analyzed. It means that the algorithms determined that the data wasn't anomalous enough at that moment in time to give it a non-zero score.
To understand better what the scoring means, please refer to this blog:
Thank you @richcollier for you answer ! I have a look at your blog , it is very interesting and useful
But, i can't understand why some values are considered anomalies and other not even when i compare the two anomaly scores , there is a clear anomaly
here an exemple:
the graph show an anomaly the February 1st 2018 at 10:00 am but there are many others anomalies (with a higher value than the red anomaly) but the color is just blue
You need to think about how this data is presented to ML - in chronological order. The first time the big spike is seen, it is flagged as a red anomaly because it's the "worst thing" that's been seen so far.
However, as time goes on ML sees that spikes of that magnitude are actually not that unexpected. They happen quite frequently after all. Therefore, ML chooses to score those subsequent anomalies with a lower score (since their probability of occurring gets higher and higher).
Hope that helps
thank you again for your support! i really appreciate your effort , you help me a lot from the beginning of my project
Well , i understand and it is very logical. In your blog you said that "the plugin construct a baseline probability model based on the observed past behavior" which confirm exactly what you explain here...
In the Bayesian distribution modeling, Which type of distribution you admet (Normal, Poisson,Binomial...)?
And in the Clustering which alogorithm is used(kmeans, Mean-Shift ,DBSCAN...)?
Perhaps you will find this video interesting and will answer your questions:
This describes some of the math(s) behind our Machine Learning
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.