Setup a Machine Learning rule is not active

nzeland149 · May 21, 2025, 8:48am

i m using packetbeat to collect data with port mirroiung and i want to be alerted if there is any anomalie or attack or so,mething that the goal u have any other suggestion better than ml here and thank u and im sorry

richcollier · May 21, 2025, 3:45pm

because you want a round number of collection intervals per bucket_span. In your case, you have 2 collection intervals if your frequency is 7.5m. If your frequency were 5m, you'd have 3 intervals. If your frequency were 3m you'd have 5 intervals. If your frequency were 1m you'd have 15 intervals per 15-minute bucket_span.

The more intervals you have, the more close to "real-time" you can get.

You should also know that the query_delay parameter also affects how far behind real-time the ML job runs. Usually, you need 1-2 minutes so that you give time for the data to be there and "findable" in Elasticsearch before ML tries to analyze it.

stephenb · May 22, 2025, 3:24am

@richcollier

Question

The frequency will help with the Max or Sum etc aggregations , something that could be determined before the entire bucket is calculated... But for aggregation such as count you would still need entire bucket correct?

nzeland149 · May 22, 2025, 9:47am

my query delay is 60s
do u rocommended if i reduced the bucket to 5min and uinsg a frequency of 2.5min to get more closer to real time

richcollier · May 22, 2025, 11:52am

No, you don't need the entire bucket if you're using count. For example, the model may say that the typical count of events during that 15m interval is 200 events but in the first 5 minutes (assuming you've set frequency to 5m) you get 10,000 events - you already know that the bucket is going to be anomalous regardless of what happens in the remaining 10 minutes of the bucket.

However, if the detector function was mean, for example, you must wait for the entire bucket's worth of data before calculating the mean.

richcollier · May 22, 2025, 2:03pm

Not necessarily.

Remember that the choice for bucket_span should be tied to "the duration of the anomaly that you care about".

What I mean by that is - whatever you're trying to detect... the count of errors in a bucket of time, the value of a field, whatever....you should choose your bucket span based on whether or not the detection lasting for the duration of the bucket_span is meaningful to you or not (i.e. "it was anomalous for only 2 minutes and that's insignificant to me, but if was anomalous for 15 minutes, I'd care" <---therefore a 2m bucket_span is too small).

So, choose your bucket_span according to this philosophy, then choose a frequency such that you can hope to detect the situation as soon as possible.

Topic		Replies	Views
Machine Learning module is triggering alerts when there is no anomaly Elasticsearch elastic-stack-machine-learning	27	2939	July 1, 2019
Kibana Machine Learning Job Alert Kibana elastic-stack-machine-learning	8	576	August 28, 2023
Email alerts from machine learning, bucket and record anomalies Elasticsearch elastic-stack-machine-learning , elastic-stack-alerting	4	809	June 7, 2020
Aletring based on anomaly duration Kibana elastic-stack-machine-learning	6	466	July 1, 2022
Machine Learning predictions are 30 minutes off, raises false positives Kibana elastic-stack-machine-learning	3	380	June 30, 2020

Setup a Machine Learning rule is not active

Related topics