Anomaly Detection Kibana skipping data

I'm running a ML job with a simple low_count detector, with a daily bucket, and I have an hourly frequency for my datafeed.
Everything seem to work correctly, however sometimes for a certain reason some data get skipped.
The ML job throws an anomaly. When I look at the count everything is normal, but the anomaly still shows actual 0

Looking at the view series, I have:

You can clearly see that the data count is not 0, but 8.

I have a 5min query_delay, but I checked at what time the data has been inserted, and it was in the middle of the bucket_span.So for sure the data didn't get ingested after the ML processed it.
Any idea?

What version? This sounds similar to a bug that was introduced in v6.5, but fixed in v6.6. See: Machine Learning datafeed skipping documents that seem to be there

ES and Kibana are running on 7.5.2

Hmm...odd. What do you see if you run:

GET .ml-anomalies-*/_search
    "query": {
            "bool": {
              "filter": [
                  { "term" :  { "result_type" : "bucket"}},
                  { "range" : { "timestamp" : { "gte": "now-3d" } } },
                  { "range" : { "anomaly_score" : { "gte": "90" } } }

Specifically looking for this anomaly record's value of the field event_count

When I run that I don't have any result

  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    "max_score" : null,
    "hits" : [ ]

Ok, well that doesn't make sense unless the anomaly in your screenshot is gone, or the score has changed to be below 90 now. Perhaps verify that that this is or is not the case. If the score has changed then just adjust the "gte": "90" in the query

The previous graph that I posted were related to record result_type

After looking at the event_count, the Anomaly Detection job missed ~2000 documents.
I'm not sure how to determine the reason.

Ah, yes sorry about the mistake on the result_type.

So, as you may well know, the event_count per bucket is the number of events present in the index for the bucket_span when the datafeed's query is executed (and thus those are the documents passed along to anomaly detection). If you have occasions in which the event_count is less than the number of docs in that timeframe (viewed retroactively) then this really does point to an ingest delay issue - which is mitigated by increasing the query_delay.

Could it be another reason?
I increased the query delay, I checked the skipped documents: The ingestion time of theses documents happens in the middle of the bucket span. Anything else I can check?

May be a relevant detail here - but while the ingestion time does matter, however, what matters more is the timestamp field used in the index pattern. If the document has a timestamp of 2020-06-09T15:44:40.608000Z , but gets indexed 5 minutes later, the document will still be missed by the ML datafeed if the query_delay isn't big enough

By the way - this blog might be useful to those that are not sure just how much their ingest delay is:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.