I am trying out the anomaly detection job.
The data is coming from logstash at interval of 5 mins. But sometimes there will be no data in those 5 mins. The data itself will be randomly distrubuted in the 5 min slot. Sometimes there will be 5 data points at start of the time slot. Sometimes in middle. Sometime at end.
Like all these are possible scenarios:
Data at start:
Data in middle:
Data in end:
I have kept the Query delay and Frequency delay to 5m. My idea is to not miss any data.
The suggested Bucket Span was 30m.
Should it not have been 5 mins?
Think of bucket_span as the analysis aggregation interval. This is different than the frequency and delay of getting the data from the source index.
Thanks for response @richcollier.
Is it fine that I keep Query delay equal to 1d? I assume the datafeed keeps a track of the point till which it has taken in the data to avoid the duplicate issues. And the only cost for me will be a more expensive query since the time range is bigger.
I am asking this since the data is actually coming from production line. And they run the line when needed. There is no schedule. There maybe days during which they do not make anything. And few days when the run the line 24hrs non stop.
There are 3 important parameters:
bucket_span is the analytics aggregation interval
frequency is how often the data is queried via the datafeed
query_delay is the total offset (from "now")
In other words, having a
1d doesn't make the query more expensive or the time range bigger. It is purely a lag behind real-time.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.