i m using packetbeat to collect data with port mirroiung and i want to be alerted if there is any anomalie or attack or so,mething that the goal u have any other suggestion better than ml here and thank u and im sorry
because you want a round number of collection intervals per bucket_span. In your case, you have 2 collection intervals if your frequency
is 7.5m
. If your frequency
were 5m
, you'd have 3 intervals. If your frequency
were 3m
you'd have 5 intervals. If your frequency
were 1m
you'd have 15 intervals per 15-minute bucket_span.
The more intervals you have, the more close to "real-time" you can get.
You should also know that the query_delay
parameter also affects how far behind real-time the ML job runs. Usually, you need 1-2 minutes so that you give time for the data to be there and "findable" in Elasticsearch before ML tries to analyze it.
Question
The frequency will help with the Max or Sum etc aggregations , something that could be determined before the entire bucket is calculated... But for aggregation such as count you would still need entire bucket correct?
my query delay is 60s
do u rocommended if i reduced the bucket to 5min and uinsg a frequency of 2.5min to get more closer to real time
No, you don't need the entire bucket if you're using count
. For example, the model may say that the typical count of events during that 15m interval is 200 events but in the first 5 minutes (assuming you've set frequency
to 5m
) you get 10,000 events - you already know that the bucket is going to be anomalous regardless of what happens in the remaining 10 minutes of the bucket.
However, if the detector function was mean
, for example, you must wait for the entire bucket's worth of data before calculating the mean.
Not necessarily.
Remember that the choice for bucket_span
should be tied to "the duration of the anomaly that you care about".
What I mean by that is - whatever you're trying to detect... the count of errors in a bucket of time, the value of a field, whatever....you should choose your bucket span based on whether or not the detection lasting for the duration of the bucket_span is meaningful to you or not (i.e. "it was anomalous for only 2 minutes and that's insignificant to me, but if was anomalous for 15 minutes, I'd care" <---therefore a 2m
bucket_span
is too small).
So, choose your bucket_span
according to this philosophy, then choose a frequency
such that you can hope to detect the situation as soon as possible.