Size of Training Data

Lazarbeam · June 6, 2018, 2:36pm

Is there any recommendation for the amount of training data to have available for ML?

We currently store one week's worth of data in Elasticsearch, which totals about 100GB of total storage. We have a process that cleans up any indexes older than a week, but want to begin taking advantage of the ML capabilities and know we will need to pump up the disk space we have to retain more data.

What we're unsure of is exactly how much data we need to retain. We currently store data associated with retail transactions and web traffic and have day-of-the-week, day-of-the-month, and monthly (seasonal) trends. What would be the recommended retention for data of this nature to take advantage of ML?

Thanks in advance!

Mark_Harwood · June 7, 2018, 11:17am

Moved to machine learning forum

Hendrik_Muhs · June 7, 2018, 11:24am

I recommend the following link: On-demand forecasting with machine learning in Elasticsearch | Elastic Blog

How much data is needed for training?

Quoting the above blog post: The sweet spot is usually about 3 weeks or 3 full intervals of periodic data

How much data we need to retain?

Machine Learning models, whether that is for anomaly detection or forecasting are self-contained, that means, once modeling has seen the data it does not need to re-access it, the important parts are incorporated into the model. But be aware that a model is constantly changing when feeding data in. So in theory you can immediately after feeding delete the data, but we do not advise to do that as you loose the ability to debug data problems and loose the ability to visualize it. Also not that models are snapshotted, which mean on a crash we need to re-feed the data between the snapshot time and the time of the crash occurred.

system · July 5, 2018, 11:24am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How many months of log data do we retain in Elasticsearch? Elasticsearch	7	2073	September 9, 2019
ML monthly forecast Elasticsearch elastic-stack-machine-learning	5	868	February 27, 2018
ELK Hardware Guidelines Elasticsearch	3	2410	December 21, 2016
Anomaly detection: minimum amount base data requirement? Elasticsearch elastic-stack-machine-learning	3	620	April 15, 2019
Hardware for ELK Elasticsearch	8	484	May 7, 2018

Size of Training Data

Related topics