I've a monthly index with 350k+ documents count/events per month or 118.3mb worth of data per month. I've data from March to July present (4 months+ data). So, in total, that is 1400k events with 473.2mb worth of data, still I'm getting the following error when I'm trying to run Forecast for a day.
The general rule-of-thumb is that you should not ask for a forecast with a duration that exceeds your historical data. In other words, if I had a week's worth of data, I should not expect to be able to forecast out a month.
So, I need to ask a few questions:
What does the general trend of the data look like? Perhaps a screenshot of the data in the Single Metric Viewer
since you are asking it to predict for every lbname I wonder if there are some instances of lbname that don't have very much data (although it looks like this one does). There may be a situation in which even 1 by_field out of all of them could maybe cause this error to be thrown. How many lbnames are there and is there a reasonable way to check to see if every one has sufficient historical data?
I see that your data doesn't really have any discernable trends to it (i.e. no cyclicality, no general upwards slope, etc.). Even if you get this to work you might be underwhelmed by the forecast. You'll likely get a horizontal line at around the middle of the range of the data (somwhere around 30-ish in this case). Maybe it is not obvious, but the forcast capability extrapolates trends in the data - it doesn't "predict" anomalies. Anomalies are surprise events, by definition.
One thing you could try is to isolate the data for one lbname (filter in Discover tab and save it as a saved search). Then, create an ML job from that saved search on that single time series and do the forecast on that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.