Machine learning - host stopped sending logs or events


(Raj) #1

Hi All,

Iam using elastic stack 5.5 to monitor the netflow and sflow and I have two separate indexes , I view them in kibana , for search and dashboards.

If i stop receiving the logs in the indexes, from logstash or the source itself stop sending the logs, in real time monitoring how can I use machine learning and watcher to get notify me in email that this particular indexes is not getting events or logs ,so that I could necessary steps to see if the source is offline or my logstash got crashed.

Since I cant monitor the kibana in real time 24/7.

I already use few watches with machine learning like high bytes usage etc..

Any help would be really helpful

Thanks in advance,

(rich collier) #2


Create a simple job with a count function (or even better, low_count) will keep track of the volume of events going into an index over time. If the count of events gets unusually low (or of course, just goes to zero), then ML will create an anomaly that looks something like this:

Also, when creating a simple job, v5.5 of ML allows you to create Watches for your job. See this blog for more information

(Raj) #3

Hi Rich,

Thank you for the reply, I created a multi metric job with influencer of host.keyword (source of my logs)
,with event rate as count , I selected the complete index and real time was enabled, to test i stopped logstash , so that i t will not forward the logs to the today's index, but this is what i get in the machine learning screen

I dont find any anomaly score created for today and then i checked in single metric viewer to see the logs,it looks like this,am not sure if am right, when the line goes down it should create anomaly score, in my case at 1pm it went down ,so should have created anomaly score for me.

Please correct me if am wrong .


(rich collier) #4

Hi Raj,

It looks like you've set things up correctly (not always easy just with screenshot of results), but keep in mind that ML will consider something to be unusual if the behavior is deemed unlikely to happen and will consider what the behavior has been in the past. I obviously cannot see the entire history of the behavior of this time-series, but it seems apparent that the index had very drastic anomalies on Aug5 and some more on Aug7. Now, it is quite possible that those behaviors, relatively speaking, are much more severe than the situation that you've tried to simulate today.

I'm guessing that you'd get different results, for example, if you started the ML job learning only on Aug8 and moving forward through today.

What's the choice of your bucket_span, by the way?

(Raj) #5

Hi Rich,

You mean to it takes time for machine learning to understand the log patterns may be in the future it might consider as a anomaly ,let say i have created Ml job today, may be next week one particular day if the source is stopped sending logs,there is a chance of getting anomaly score.

All the anomaly scores created aug 5 and aug 7 is for for high count and not for low count or zero. Am mainly concentrating for the low count or zero,so i get notified if the source stops sends logs or even logstash stopped.

Bucket span of 5m


(rich collier) #6

Yes, that is correct.

I'd be curious if you were to re-run this job using only the low_count function. If you have time- please try and re-post the results!

(Raj) #7

Hi Rich,

I created a new job with low count and influencer as host (which is the source which sends the logs to logstash) ,
I could see this all the anamoly detections are termed as warning and not critical,

one more thing is I stopped logstash around 11pm , to test it ,it got stuck and there is no anomaly detection at 11 pm if you could see it, its just hanging thats it.

This what I want to be termed as anomaly detection since logstash is not forwarding logs and iam interested in getting a email notification when this happens which is possible using watcher .


(rich collier) #8

Hi Raj,

The score that the anomaly gets depends on how much data preceded it. What I mean by that is that the judgements made by ML are based on probability. The more established history a metric has, the higher the anomaly score will be once it begins to deviate. Here, in your screenshot, it's hard to tell how much history was established for this metric before the light blue anomaly raised on Tuesday the 15th. Can you possibly zoom the picture out so that we can see the full history of the data that you sent to ML?

For your second question, how long after you shut off Logstash did you take the screenshot? Because you have a 5m bucket_span, it might take 5-7 minutes for the ML cycle to complete and report on this last bucket. It's hard to tell when the screenshot was taken (clock time) with respect to the analysis time.

Can we use "sub-partitioning" in ML?
(Raj) #9

Hi Rich,

Thank you for the reply ,
I have zoomed out yesterdays ML

and for the second one I have turned off the logstash for more than 30 minutes ,to see somethings happens in ml.


(rich collier) #10

Hi Raj,

From the screenshot it looks like there wasn't enough data before you decided to try your test of turning off Logstash.

Here's what I suggest:

  1. Create a new job (can clone from this existing job to make it easier). Start the job starting now. In other words, avoid the times of past testing.
  2. Allow this new job to run for approximately 2-3 days in "real-time" mode (operating on live data)
  3. Then, attempt your experiment after those 2-3 days have passed.

(Raj) #11

Hi Rich,

Thank you for reply ,will do the same and will let you know the output.


(rich collier) #12

Raj - any update?

(Raj) #13

Hi Rich,

Thank you for the followup ,actually I tried to stop the logstash twice (yesterday and today) after running the job for one week,



but am not sure whether it will be only warning or minor or it will change to critical.

Thanks in advance ,

(rich collier) #14

Anomalies recorded will never be "upgraded" from minor to critical at a later time (however, the reverse is possible - an anomaly that once was critical may be edited later and be downgraded if something more unusual comes along later).

Keep in mind that the scores that you get in real-time are fully dependent on the past anomaly scores and past behavior. In other words, the first time you test a behavior, you might get a relatively high score. But, if you test that behavior again and again - you are forcing that behavior to look "normal", thus likely diminishing the score it receives each time it occurs. However, as more time goes by and more observations of the "actual normal" is observed by ML, then the a subsequent truly anomalous situation will again be scored higher.

(Raj) #15

Thank you rich for the reply :slight_smile: that sounds sensible.

(system) #16

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.