ML job not updating in real time

machine-learning

#1

Hey, I'm having a strange problem.

I have an ML job (with a bucket span of 1 day) that are running in real time, but it has the "latest timestamp" on 2018-07-26 even though the latest data is from 2018-08-01. If I stop it and run it manually from 1 week before the latest timestamp till today it will still not update. If I create a new exactly same job it will get the latest timestamp of 2018-07-31, which still is not the latest. But if I try to clone the original job with the latest timestamp of 2018-07-26 then I will get the correct latest timestamp of 2018-08-01.


(Dimitris Athanasiou) #2

Hi,

Could you paste the job configuration as well as the datafeed configuration please?
You can use the get-job API and get-datafeed API to do that.

Please also mention which version you are running on.

Finally, could you explain a bit more on how the data is ingested? Are there multiple documents per day or just one? Are they continuously indexed or in batches? If in batches, when are those batches indexed in a day?


#3

Hey.

I'm running version 6.3.1.
There are added average 60(±40) documents per day, mostly during the night, but also during the day: Monday to Friday. They are not added in batches, sometimes there can go hours between and sometimes just a few minutes.
They are added to the same timestamp, depending on which version they are running. Ex. lets say we have a few documents inserted at maybe 2018-08-03-01:53:06, 2018-08-03-04:27:42 and 2018-08-03-03:12:31, the two first are added to the timestamp August 2nd 2018, 22:01:00.000 and the last one are added to the timestamp August 2nd 2018, 22:06:00.000.


(Dimitris Athanasiou) #4

OK, I am pretty sure that what's happening here is that the datafeed is advancing through those times and finds no data. The datafeed runs real-time but they way the data is indexed is not exactly real-time so the datafeed searches for a time range, sees no data and advances forward. You will probably need to adjust the datafeed frequency and query_delay parameters to work with the date manipulations you are doing. You can read more in Datafeed.


(Mark Walkom) #5