Machine learning - host stopped sending logs or events

Hi All,

Iam using elastic stack 5.5 to monitor the netflow and sflow and I have two separate indexes , I view them in kibana , for search and dashboards.

If i stop receiving the logs in the indexes, from logstash or the source itself stop sending the logs, in real time monitoring how can I use machine learning and watcher to get notify me in email that this particular indexes is not getting events or logs ,so that I could necessary steps to see if the source is offline or my logstash got crashed.

Since I cant monitor the kibana in real time 24/7.

I already use few watches with machine learning like high bytes usage etc..

Any help would be really helpful

Thanks in advance,
Raj

Raj,

Create a simple job with a count function (or even better, low_count) will keep track of the volume of events going into an index over time. If the count of events gets unusually low (or of course, just goes to zero), then ML will create an anomaly that looks something like this:

image

Also, when creating a simple job, v5.5 of ML allows you to create Watches for your job. See this blog for more information

1 Like

Hi Rich,

Thank you for the reply, I created a multi metric job with influencer of host.keyword (source of my logs)
,with event rate as count , I selected the complete index and real time was enabled, to test i stopped logstash , so that i t will not forward the logs to the today's index, but this is what i get in the machine learning screen

image

I dont find any anomaly score created for today and then i checked in single metric viewer to see the logs,it looks like this,am not sure if am right, when the line goes down it should create anomaly score, in my case at 1pm it went down ,so should have created anomaly score for me.

Please correct me if am wrong .

Thanks,
Raj

Hi Raj,

It looks like you've set things up correctly (not always easy just with screenshot of results), but keep in mind that ML will consider something to be unusual if the behavior is deemed unlikely to happen and will consider what the behavior has been in the past. I obviously cannot see the entire history of the behavior of this time-series, but it seems apparent that the index had very drastic anomalies on Aug5 and some more on Aug7. Now, it is quite possible that those behaviors, relatively speaking, are much more severe than the situation that you've tried to simulate today.

I'm guessing that you'd get different results, for example, if you started the ML job learning only on Aug8 and moving forward through today.

What's the choice of your bucket_span, by the way?

1 Like

Hi Rich,

You mean to it takes time for machine learning to understand the log patterns may be in the future it might consider as a anomaly ,let say i have created Ml job today, may be next week one particular day if the source is stopped sending logs,there is a chance of getting anomaly score.

All the anomaly scores created aug 5 and aug 7 is for for high count and not for low count or zero. Am mainly concentrating for the low count or zero,so i get notified if the source stops sends logs or even logstash stopped.

Bucket span of 5m

Thanks,
Raj

Yes, that is correct.

I'd be curious if you were to re-run this job using only the low_count function. If you have time- please try and re-post the results!

Hi Rich,

I created a new job with low count and influencer as host (which is the source which sends the logs to logstash) ,
I could see this all the anamoly detections are termed as warning and not critical,

image

one more thing is I stopped logstash around 11pm , to test it ,it got stuck and there is no anomaly detection at 11 pm if you could see it, its just hanging thats it.

This what I want to be termed as anomaly detection since logstash is not forwarding logs and iam interested in getting a email notification when this happens which is possible using watcher .

Thanks,
Raj

Hi Raj,

The score that the anomaly gets depends on how much data preceded it. What I mean by that is that the judgements made by ML are based on probability. The more established history a metric has, the higher the anomaly score will be once it begins to deviate. Here, in your screenshot, it's hard to tell how much history was established for this metric before the light blue anomaly raised on Tuesday the 15th. Can you possibly zoom the picture out so that we can see the full history of the data that you sent to ML?

For your second question, how long after you shut off Logstash did you take the screenshot? Because you have a 5m bucket_span, it might take 5-7 minutes for the ML cycle to complete and report on this last bucket. It's hard to tell when the screenshot was taken (clock time) with respect to the analysis time.

1 Like

Hi Rich,

Thank you for the reply ,
I have zoomed out yesterdays ML

image

and for the second one I have turned off the logstash for more than 30 minutes ,to see somethings happens in ml.

Thanks,
Raj

Hi Raj,

From the screenshot it looks like there wasn't enough data before you decided to try your test of turning off Logstash.

Here's what I suggest:

  1. Create a new job (can clone from this existing job to make it easier). Start the job starting now. In other words, avoid the times of past testing.
  2. Allow this new job to run for approximately 2-3 days in "real-time" mode (operating on live data)
  3. Then, attempt your experiment after those 2-3 days have passed.
1 Like

Hi Rich,

Thank you for reply ,will do the same and will let you know the output.

Raj

Raj - any update?

Hi Rich,

Thank you for the followup ,actually I tried to stop the logstash twice (yesterday and today) after running the job for one week,

Yesterday

Today

but am not sure whether it will be only warning or minor or it will change to critical.

Thanks in advance ,
Raj

Anomalies recorded will never be "upgraded" from minor to critical at a later time (however, the reverse is possible - an anomaly that once was critical may be edited later and be downgraded if something more unusual comes along later).

Keep in mind that the scores that you get in real-time are fully dependent on the past anomaly scores and past behavior. In other words, the first time you test a behavior, you might get a relatively high score. But, if you test that behavior again and again - you are forcing that behavior to look "normal", thus likely diminishing the score it receives each time it occurs. However, as more time goes by and more observations of the "actual normal" is observed by ML, then the a subsequent truly anomalous situation will again be scored higher.

2 Likes

Thank you rich for the reply :slight_smile: that sounds sensible.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.