How does Anomaly Detection work?

I have a question about the Anomaly Detection module provided by elastic stack. As per my understanding of Machine Learning the more data being fed to the model the better learning it will do provided the data is proper. Now I want to use the Anomaly Detection Module in kibana. I did some testing with that and with some reading I found that basically it is better that we have at least 3 weeks of data or 20 buckets worth. Now lets say we receive about 40 million records a day. This will take a whole lot of time for the model to train for a day itself now if get about 3 weeks worth of this amount of data this will put a lot of pressure on the node. But if I feed the model less data and reduce the bucket span it will make my model more sensitive. So what is my best bet for this. How is that I can make the most out of the Anomaly Detection module.

Just FYI: I do have a dedicated Machine learning Node with equipped with more than enough memory but it still takes a whole lot of time to process records for a day so my concern is it will take a whole whole lot of time to process 3 weeks worth of data.

So My question is that if we give large amount of data for short amount of time say 1 week to the model for training and if we give large amount of data for a slightly longer amount of time say 3 weeks to the model for training will these two models detect anomalies with the same accuracy.

Three or more weeks is ideal if your data has weekly periodicity (i.e. where the behavior of the data has patterns depending on the day of the week - weekdays vs. weekends for example). If your data doesn't have this weekly periodicity, then just a few days of data should be sufficient.

Remember that you don't always have the ML job "look back" for a long period of time, you can just start the Anomaly Detection job in motion (in "real-time") and then just wait a few days!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.