Spotting Anamolies from Log Data

I have the Elk framework setup on my local system with Kibana running on my localhost. I am able to view the logs on Kibana dashboard with the time range. Then I activated a trial license for a month to use the Machine Learning Features offered. So I choose my index pattern associated with the Log data & run a ML job to spot anamolies from my log file data. However, every time I run the ML job it shows no matching anamolies in the report. Can some one tell me what exact data should be present in my log file which will make the Framework spot anamolies from my log data in case of a Denial of service attack,etc., The point is I want to get some anamolies detected from the log data I supply

Hi there - can you post some screenshots and possible job configurations? That would be helpful!

Hey @richcollier , I figured out the issue, so Ideally to detect anomalies, Elasticsearch needs such data that contains anomalies, earlier my log data just contained a series of 50 log lines generated in an hour on day 1 and this pattern remained same. Now when the log data changed to 10,000 log lines in an hour on day 2, that is when ELK framework started highlighting the anomalies to me, in red on the time graph. This is the way how it works right, correct me if I am wrong. Is there some other way a Denial of Service (DOS) attack can be predicted from the log lines, apart from using the timestamp filters of the log lines?

ha yes, it cannot detect anomalies if nothing in the data is unusual! :joy:

Sounds like you have it working now.

Is there some other way a Denial of Service (DOS) attack can be predicted from the log lines, apart from using the timestamp filters of the log lines?

When the DOS attack starts and the volume of logs begin to increase, then certainly Anomaly Detection will tell you and you can hook it up to send a proactive alert. If your question is if Anomaly Detection can "predict" anomalies (meaning let you know before they actually happen) then of course not, that would be magic! :joy:

Alright thanks a lot @richcollier , got my confusion cleared. Currently now I tried generating 10 log lines at 6pm on 1st Aug., then I generated 6,000 log lines at 7pm on 1st Aug. I selected the time range from 12:00 am on Aug 1 as start time till 7.30 pm on 1st Aug with count (event rate) criteria for the anomaly job using multiple metrics. But it shows no matching anomalies which should not be the case. Is it because on 31st July, it had 6,000 log lines stored already in Elastic search database & so it is realizing that even 6,000 log lines is now a normal scenario ? If that is the case, is there some way to make Elastic search forget about its past data or like delete the data from its database? Can you please tell me how do we do that? I have attached a screenshot below of the ML job that showed anomalies on 31st July, that is what output I want now, which is not happening.

Attaching a screenshot of the graph for "Use full log stash filter data" for your reference @richcollier

When you create the job in the first place, choose an appropriate start time for the learning. It doesn't always have to learn on all of the data you have in the index.

Hi @richcollier I'm doing the same thing. I'm not selecting the entire log data. Earlier it detected anomolies when the log lines were 1000 but now when I'm trying to run the job over the same volume of log lines, it fails to detect anomolies. What is the possible solution for this. Can we connect over a short call to resolve this. I really need to get this done. Thanks for all the assistance so far

Just make sure you have enough time of "normal" behavior in the data before you present the anomaly. This should be on the order of a hundred bucket_spans (at least). So for example, if your bucket_span setting is 15m, you'll need at least 24h of normal data before the anomaly .

And obviously, more normal time before the anomaly, the better.

Alright @richcollier I got it. Thank you so much for your prompt assistance. I figured out a new plan now to fulfill my requirement, so now for this I have 2 questions -

  1. I'm currently on Elastic, Logstash & Kibana- all of them are using version - 7.6.2 . Suppose if I need to upgrade Kibana to a higher version, will the ELK framework & anamoly detection still work correctly? As in, say if Elastic search & Logstash is on some other lower version or all the 3 components must be on the same version to work properly?

  2. Suppose I want to delete all the data present in the Elastic search No SQL database & start completely fresh by feeding it a new series of log lines data.. how do we do that any document for the same as whenever I start the Elastic search batch job it always shows the old data it has accumulated into it.

  1. Wow, that's an old version and is no longer supported. Yes, you need to have everything on the same version (do not only upgrade Kibana, for example)

  2. Just delete the index/indices of interest with the API

DELETE /my-index

@richcollier I'm confused whether I'm correctly following what you told me regarding deleting the data in Elasticsearch database. Can you send some screenshot to better understand. I'm attaching the screenshot below of what I did. Here log* is the name of my index. It shows a success response, but the data still does not get deleted from the database.

Adding to the above, I want to basically have 0 log lines displayed in the Explore tab of Kibana. That is my objective, for which I want to delete all the data present in the Elastic search No SQL database

With that request, you actually created an index called delete. You did not delete an index pattern named log*.

To delete an index (or set of indices matching a pattern), you would invoke

curl -X DELETE "localhost:9200/log*"

from the command line or in the DevTools console in Kibana you would execute:

DELETE /log*

Thanks alot @richcollier . It worked now. I'm almost close to finishing my project satisfying all the requirements.
I need to showcase this project idea to a jury panel.
With regard to the same, just wanted to know if you are aware of some online resources or if you could provide some information on which Machine Learning Model is used when we create the anamoly detection job & how good is its efficiency.
Can the model be trained for more stuff in the future. I believe behind the scene, there would be some python program running for it, if I'm not wrong.

Hello @robin2 ,

see "Troubleshooting machine learning anomaly detection and frequently asked questions" for more information on the anomaly detection algorithms.

@valeriy42 Thanks alot valeriy42. This is what I was looking out for. :+1: