Machine Learning data disapeared

machine-learning

(David Tomasheski) #1

I have been running a Machine Learning job which I created on April 3rd. The metric has a predictable wave pattern to it. The blue model bounds were getting tighter. I checked in on April 11th and all looked good. I checked in on April 12th and days of waves that were there the day before are now missing. No anomalies were triggered by this and strangely when I checked in on it the job started collecting data again(which caused anomalies since now it seems to have retrained on empty data). I have no idea where my waves that were there went nor why it started again. Is there anyway to get my data back? And perhaps more important since I can just recreate the job how do I prevent this from happening again.

Here is a screen grab of the ML job

And Here is a shot of the data from the 4th on a Visualization


(David Tomasheski) #2

by the way I re-created the job using the full dataset and this was what the ml graph had looked like


(rich collier) #3

Hello David,

Certainly, this is an unexpected occurrence - I have never seen this before.

Let me explain that for Single Metric Jobs (where the model bounds are visible) - when you're plotting things in the Single Metric Viewer, the Kibana UI shows you the graph, but it is built from the data that are logged into the .ml-anomalies-* index. It is not showing you the graph built off of the raw data.

In the .ml-anomalies-* index (filtered for the job_id:wcc AND result_type:model_plot) you'd be able to see field names such as actual, model_lower, model_upper and so on.

For example, I could plot these using a standard Kibana visualization (in this case TSVB):

I would double check to see what happened to your .ml-anomalies-* index for that first (original) job during those days. Do records still exist?


(David Tomasheski) #4

Rich,
Thank you for your reply. Sorry about the delay in mine but things have been hectic here.
I graphed the average for .ml-anomalies-wirelesscallcount and it the results were 0 accross the board until the days where it started up again. if i graph the new one (wcc) it has results. Also I don't know if it makes a difference but if I graph count instead of average for wirelesscallcount(the one that disapeared) there are results on the days were there is no data.


(rich collier) #5

This makes me think that you actually have ingest delays. In other words, if your ML job was running in "real-time" (operating in nearly real-time on data being currently ingested) then if you have delays the ingestion of that data, ML will "miss" seeing that data when it asks for data in the last X minutes.

If that data arrives later, and you yourself look at the data at a later time - you won't be able to tell that the data arrived late. However, in the moment of analysis, ML might not be able to "see it" because it has yet to be ingested by elasticsearch.

The query_delay parameter of ML's datafeed is what controls the lag behind real-time that ML will operate so it can compensate for ingest delays.

Perhaps you should look to see how quickly data with a particular timestamp is ingested and is available via search - and how consistent that is.


(Mark Walkom) #6