Setup a Machine Learning rule is not active

i m using packetbeat to collect data with port mirroiung and i want to be alerted if there is any anomalie or attack or so,mething that the goal u have any other suggestion better than ml here and thank u and im sorry

because you want a round number of collection intervals per bucket_span. In your case, you have 2 collection intervals if your frequency is 7.5m. If your frequency were 5m, you'd have 3 intervals. If your frequency were 3m you'd have 5 intervals. If your frequency were 1m you'd have 15 intervals per 15-minute bucket_span.

The more intervals you have, the more close to "real-time" you can get.

You should also know that the query_delay parameter also affects how far behind real-time the ML job runs. Usually, you need 1-2 minutes so that you give time for the data to be there and "findable" in Elasticsearch before ML tries to analyze it.

1 Like

@richcollier

Question

The frequency will help with the Max or Sum etc aggregations , something that could be determined before the entire bucket is calculated... But for aggregation such as count you would still need entire bucket correct?

my query delay is 60s
do u rocommended if i reduced the bucket to 5min and uinsg a frequency of 2.5min to get more closer to real time

No, you don't need the entire bucket if you're using count. For example, the model may say that the typical count of events during that 15m interval is 200 events but in the first 5 minutes (assuming you've set frequency to 5m) you get 10,000 events - you already know that the bucket is going to be anomalous regardless of what happens in the remaining 10 minutes of the bucket.

However, if the detector function was mean, for example, you must wait for the entire bucket's worth of data before calculating the mean.

1 Like

Not necessarily.

Remember that the choice for bucket_span should be tied to "the duration of the anomaly that you care about".

What I mean by that is - whatever you're trying to detect... the count of errors in a bucket of time, the value of a field, whatever....you should choose your bucket span based on whether or not the detection lasting for the duration of the bucket_span is meaningful to you or not (i.e. "it was anomalous for only 2 minutes and that's insignificant to me, but if was anomalous for 15 minutes, I'd care" <---therefore a 2m bucket_span is too small).

So, choose your bucket_span according to this philosophy, then choose a frequency such that you can hope to detect the situation as soon as possible.

1 Like