I have Weekly repetitive pattern.
Each Monday, Tuesday, Wednesday, Thursday and Friday, pattern is sensibly the same : count data is growing at 8am, decrease at 12, growing at 2 pm and finally decrease at 6pm. during the night, count data is null. On theses days, pattern look like a camel, with 2 bumps.
Each saturday and sunday, data count is null, pattern look like country skyline.
Finally, each week I have 5 camels and 2 country skylines.
What value should I put in bucket span of a multi-metrics jobs to detect anomaly ?
For exemple, there is obvious anomaly if on Monday there is no data like saturday, or if pattern don't look like camel with 2 bumps but dromedary with 1 bump.
What value should I put in bucket span : 5 minutes, 15 minutes, 30 minutes, 1 hour, 2 hours, ..., 6 hours, 7 hours, 12 hours, 13 hours, ... 24 hours, ... ?
I have lot of data, every second.
If I put 5 hours in bucket span, is it possible to get email alert in real time when data count anormally decrease / increase or must I wait 5 hours, that is not very responsive ?
In general, if you want to have time-based behaviors modeled correctly, you'd choose a bucket_span that is less than the duration of pattern. So, if from hour-to-hour, the behavior changes, then having a bucket_span of 5 hours will "smooth out" those behaviors (and you won't be able to see the nuances of it). So, likely a choice of 5m,10m,15m would be more sensible, especially if you say that the ultimate granularity of your data is 1s (in that way, you don't run the risk of having sparse buckets)
With 5m, 10m, 15m, 30m machine learning don't detect when Saturday or Sunday pattern
abnormally look like Monday, Tuesday, Wednesday, Thursday or Friday pattern.
I need to put 3 hours or 6 hours bucket span for that. That is too long for me to alert real time error...
It will take 3+ weeks of historical data to recognize the weekend pattern. Have you run that much data through yet? Can you post a screenshot of the Single Metric Viewer UI for your job?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.