Datafeed frequency must be a multiple of the aggregation interval

machine-learning

#1

After upgrading from 5.x to 6.x, I am getting the following error for some ML jobs:

Datafeed frequency must be a multiple of the aggregation interval

The bucket span is set to 900s and the datafeed frequency is 450s. Clearly, the error says it should be the other way around.

However, this used to work in 5.x and the documentation suggests it still should : https://www.elastic.co/guide/en/elasticsearch/reference/6.2/ml-datafeed-resource.html#ml-datafeed-resource

frequency
(time units) The interval at which scheduled queries are made while the datafeed runs in real time. The default value is either the bucket span for short bucket spans, or, for longer bucket spans, a sensible fraction of the bucket span. For example: 150s.

Is the documentation outdated or am I reading it wrong ? My undestanding was that the frequency parameter was used to get partial results without waiting for the full bucket span.

Thanks,

Thomas


(Dimitris Athanasiou) #2

Hi Thomas,

The error does not refer to the relationship of the frequency with the bucket_span but instead that of the frequency with the aggregation interval. Datafeeds may be configured to use aggregations. Jobs created through the single-metric wizard make use of that feature.

To fix the issue, in the job management page, expand the job in question and click on the JSON tab. In the JSON, find the datafeed_config section. In there, you will see a date_histogram aggregation. Observe the value of the interval field. Frequency needs to be a multiple of that value. You can then edit the job by going back to the job management page and clicking the edit job button for the job in question. Then, in the datafeed tab, you can update the frequency accordingly.


#3

I see, thanks for the information!


(Mark Walkom) #4