I'm running a ML job with a simple low_count detector, with a daily bucket, and I have an hourly frequency for my datafeed.
Everything seem to work correctly, however sometimes for a certain reason some data get skipped.
The ML job throws an anomaly. When I look at the count everything is normal, but the anomaly still shows actual 0
You can clearly see that the data count is not 0, but 8.
I have a 5min query_delay, but I checked at what time the data has been inserted, and it was in the middle of the bucket_span.So for sure the data didn't get ingested after the ML processed it.
Ok, well that doesn't make sense unless the anomaly in your screenshot is gone, or the score has changed to be below 90 now. Perhaps verify that that this is or is not the case. If the score has changed then just adjust the "gte": "90" in the query
Ah, yes sorry about the mistake on the result_type.
So, as you may well know, the event_count per bucket is the number of events present in the index for the bucket_span when the datafeed's query is executed (and thus those are the documents passed along to anomaly detection). If you have occasions in which the event_count is less than the number of docs in that timeframe (viewed retroactively) then this really does point to an ingest delay issue - which is mitigated by increasing the query_delay.
May be a relevant detail here - but while the ingestion time does matter, however, what matters more is the timestamp field used in the index pattern. If the document has a timestamp of 2020-06-09T15:44:40.608000Z , but gets indexed 5 minutes later, the document will still be missed by the ML datafeed if the query_delay isn't big enough