Machine Learning datafeed skipping documents that seem to be there

richcollier · March 5, 2019, 1:49pm

I think there might be a few things going on here.

There is a bug that was introduced in v6.5 (and will be fixed in v6.6.2+) that inadvertently creates an anomaly on an interim (un-finalized, or still-open bucket). See:

and the corresponding bug:

github.com/elastic/ml-cpp

[ML] Unexpected interim results after advancing time into new empty bucket

opened 04:40PM - 30 Nov 18 UTC

closed 06:21PM - 26 Feb 19 UTC

dimitris-athanasiou

>bug :ml

**How to reproduce** 1. Create a job with simple count detector and a bucket …span of 5m 2. Run some data through the job up to the end of a bucket (using the `end` parameter of the start datafeed API) 3. Open the job again (it should have been auto-closed from step 2) 4. Call the flush API: ``` POST _xpack/anomaly_detectors/{job_id}/flush?advance_time={time}&calc_interim=true ``` where {time} should be a timestamp into the current bucket. E.g., if `end` was `2018-12-01T00:00:00Z`, {time} should be `2018-12-01T00:00:01Z` **Observed Behaviour** If you get the anomaly records, you should see a record which is interim and has an actual value of `0.0`. This shouldn't have been created. Interestingly, calling step 4 with {time} being one millisecond forward makes that record disappear. Also, this is broken since version `6.4.`

The auto-annotation for missing data, however, should not stumble onto this bug because it explicitly ignores interim buckets. In order to validate your datafeed timing (what bucket's it's querying and when), you could enable TRACE logging for the datafeed:

PUT _cluster/settings
{
  "transient": {
    "logger.org.elasticsearch.xpack.ml.datafeed": "TRACE"
  }
}

(this is a transient setting that won't survive a cluster re-start but you can always reset this back to "DEBUG" or "NORMAL" when this experiment is over)

You can also have your Watch log what it sees as well - then, in the elasticsearch.log file we should have a better understanding of when the datafeed runs and what window of time it queries - while at the same time seeing the output of your watch that is trying to also do the validation.

Topic		Replies	Views
Datafeed missing in ML job Kibana elastic-stack-machine-learning	5	752	December 6, 2020
Elastic Machine learning, Datafeed has missed xxx documents due to ingest latency Kibana elastic-stack-machine-learning	2	414	June 12, 2023
Datafeed has missed 152 documents due to ingest latency Kibana elastic-stack-machine-learning , runtime-fields	7	1102	January 3, 2023
ML jobs kibana not working as expected Kibana elastic-stack-machine-learning	3	331	June 23, 2022
Anomaly Detection Kibana skipping data Kibana elastic-stack-machine-learning	12	895	July 15, 2020

Machine Learning datafeed skipping documents that seem to be there

Related topics