ML jobs with missing documents

saraKM · July 6, 2020, 6:56am

We have some jobs running with high numbers of documents missing. As per elastic ML documents, those checks are done after buckets with the missed documents have been processed and anomaly scores are finalized and “If there is indeed missing data due to their ingest delay, the end user is notified”. The question is how we can make sure not missing any document. Increasing the query delay usually works, but we need to make sure those lost ones are processed at the end (If notified soon enough, how stopping/starting datafeed to consider those time ranges impact the ML model and results? duplicate processing?)

richcollier · July 6, 2020, 3:19pm

In general, you want your query_delay to be set as high as possible to avoid missing documents due to ingest delays.

The ML job will not re-process past buckets unless you manually use the ML Model Snapshots API to revert the job to a model that was saved before your data was missed. You could pass the delete_intervening_results flag to delete any anomalies that surfaced since that time.

After this, you could re-start the datafeed from that moment moving forward.

saraKM · July 6, 2020, 11:58pm

Many thanks for your quick response

system · August 3, 2020, 11:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Datafeed missing in ML job Kibana elastic-stack-machine-learning	5	727	December 6, 2020
Machine Learning datafeed skipping documents that seem to be there Elasticsearch elastic-stack-machine-learning	14	4146	April 9, 2019
Machine learning jobs not reflecting new data Elasticsearch elastic-stack-machine-learning	5	828	October 30, 2018
Anomaly detection job failing to pull any data in Elasticsearch	4	376	May 5, 2021
[6.1.1] Machine Learning Job: reanalyse datapoints / ignore last bucket Elasticsearch elastic-stack-machine-learning	4	825	October 29, 2018

ML jobs with missing documents

Related topics