Machine Learning Issue

This diagram I got it from elastic website : [https://www.elastic.co/guide/en/x-pack/current/ml-gs-forecast.html]

I have created my forecast based on my machine data. I wonder why with almost 4 weeks data which I have loaded but I’m not seeing a confidence prediction by looking at the shaded yellow area ? Even if you look at the shaded blue area which represents the bounds for the expected values is having a wide bound compared to the diagram above. Can anyone please advise on this ?

Hey @Karthigesu_Rajendran,

Looking at your results, this particular metric seems very noisy for this data. Consequently, the confidence interval is not very good. Even though the ML system abstracts many worries away, the old adage "garbage-in == garbage-out" still applies.

You should go back and look at your data and figure out what answers you are wanting from it.

  • Are you utilizing the correct metric (count, sum, avg, etc.) for what you want to know about your data?
  • Is there a way to clear out the noise with a query?
  • Are you using the correct buck_span given your data frequency?

Could you provide a little context around the makeup of your data and what you are trying accomplish with the ML product?

1 Like

Just to follow on, specifically related to your question regarding the width of the bounds, the reason for this is the aggregation scheme used in this chart. We make predictions at the job bucket length, which is 15m. The bounds relate to the value of the metric sampled at this interval. The aggregation interval of the chart you've attached is 6h, i.e. 24 fifteen minute buckets. When the chart draws the actuals it shows the mean of the values of the 24 buckets which fall in each 6hr period. When we draw the upper and lower confidence intervals we show the min and max of the 24 values for the prediction confidence interval in each 6hr period, respectively. The effect of taking the mean of 24 noise like values is to reduce the variance by 24 so the spread by around 5, i.e. around sqrt(24). The bounds look consistent with this. If you want to confirm, zoom in to a window such that the aggregation interval displayed at the top of the chart reads 15m. You should see the values to which the bounds apply.

Ben's observation still applies, this looks pretty much like noise with perhaps a small downwards trend so the best forecasting can do (for mean squared error) is to predict the mean and show the right interval for the noise.

hi @BenTrent & @Tom_Veasey ,

First of all , thanks for replying. I have used the most suitable metric and bucket span as well.

Currently, we have a production line in our manufacturing plant whereby this data we got it from AOI machine. Each time after an inspection of the board, a log file will be generated and we use the filebeats to collect the data and send it through using the logstash to our ES. We are processing the max number of False Call from the machine itself. We would like to evaluate the data in order to have a predictive maintenance. The log file will be generated every 5-15mins (on average), so i have choose the bucket span as 15mins

Your setup is reasonable. Your data has no observable trend other than being relatively consistent in an operating range (with a few anomalies in there of course). Therefore, the forecast on it will seem lackluster because the forecast is for trends, not anomalies.

Hi @richcollier,

Yes it looks more for trend rather than anomalies. Could it be due to lack of data which i'm unable to simulate such anomalies forecast ? We are trying to find out whether if we can come out with Predictive Maintenance whereby we are able to see the trend from the data which will help us to tell that the machine will be down in few more days or etc ?

Sorry, you misunderstood me. Our forecasts only predict trends. Anomalies, by definition, are unexpected events and are not predictable.

You can, of course, use the forecasts to detect a known issue. For example, you can execute the forecast around a metric like disk usage and can therefore answer the question: "are there any disks in my environment that will exceed 95% utilization in the next month?". Obviously, in that case, you are the one defining that the exceeding of 95% full is notable and detecting it early could save you a future potential problem/outage.

Oh, I really appreciate for your help in correcting me, thank you! I understand that unsupervised machine learning only will be able to forecast the trends based on the previous data. Thanks for the good examples as well .

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.