How to run a custom anomaly detection algorithm on Elasticseach and visualize the time series in Kibana?

dadadima · February 6, 2020, 9:08am

I went through the definitive guide published by O'Reilly and I couldn't solve my problem. I have also found a similar questions regarding this, but I haven't understand quite well how it is possible to do so.

Let's assume I am running an instance of the ELK on VM inside a node of my cluster (so in this case there is no distributed architecture). I was wondering whether it is possible to implement – let's say – the current state-of-the-art unsupervised real-time anomaly detection algorithm on time series (assuming I have a simple flow of log data). Would I then be able to visualise the time-series and the outliers in Kibana in some sort of way?

Now, I would like to run the instance of ELK on a distributed architecture (assuming I have more elasticsearch instances and many nodes) and do the same implementation as mentioned above. Will it run in a distributed way? Would still be resource and time efficient like the ML-Anomaly Detection included in the X-pack?

If so, in at least one of the cases, could you point me to the right source (books, blogs, etc) in order to learn how to do perform such task?

richcollier · February 10, 2020, 6:14pm

Elastic ML is the state-of-the-art unsupervised real-time anomaly detection algorithm on time series

In all seriousness - there are probably more than 100 "person-years" of research and development of the codebase that is Elastic ML. And, it is not just the anomaly detection algorithms/techniques. It is also all of the other logistical details:

How to leverage both historical and real-time data
How to persist model state and "pick up where you left off" in the case of a node/cluster restart
How to both deal with raw and/or aggregated data
How to snapshot/restore those models to an earlier version in case there was a problem
How to filter out data you don't want analyzed that is mixed in with data that you do want to be analyzed
How to automatically split analysis across instances for parallel analysis
How to ignore results that have special meaning in one's domain
How to ignore time-frames that are known to be problematic
How to provide guardrails against excessive memory consumption that could cripple a node
How to deal with sparse data
How to manage the scheduling and throttling of querying for data to analyze and not overburden a cluster
How to format and publish results that are useful for UIs or API calls
How to clean up/maintain information that is no longer needed
...and many other details

These are all details that Elastic ML does for you. I'm not saying that one couldn't manage to implement some of these things on your own, but you should know what you're getting into!

system · March 9, 2020, 6:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Anomaly detection Kibana elastic-stack-machine-learning	2	467	August 5, 2022
Anomaly detection algorithm Elasticsearch elastic-stack-machine-learning	0	105	May 9, 2024
ELK 7.8 detect anomalies? Kibana elastic-stack-machine-learning	2	328	August 17, 2020
Which Machine learning Algorithms used by Elastic x-pack ML? Elasticsearch elastic-stack-machine-learning	2	2682	October 30, 2018
How does Anomaly Detection work? Elasticsearch elastic-stack-machine-learning	2	590	March 17, 2023

How to run a custom anomaly detection algorithm on Elasticseach and visualize the time series in Kibana?

Related topics