I have a feed of data coming in to my ES cluster which can arrive in bursts. I initially set the feed to bring in the last 14 days worth of data so I had some data to backtest. This can take hours as there are millions of records. After half of it had been ingested, I created an ML model and the lookback completed up until the current record at that time. Why do the subsequent records with later timestamps not get processed? The datafeed is set to real-time and even stopping the datafeed and restarting it from the last timestamp in the index to real-time does not cause it to continue processing the remaining records up to the current point in time.
This leads me to another question. Due to the nature of the data we are ingesting, it can potentially arrive over the course of a day. Does the datafeed start from the latest timestamp in the index or the last time it ran? I am aware that I could set a query delay on the datafeed - i suppose for an entire day. But, how would you recover if the source index got more than a day behind - would you need to recreate the entire job?