Anomaly Detection Kibana skipping data

alexisdrnt · June 1, 2020, 3:39pm

I'm running a ML job with a simple low_count detector, with a daily bucket, and I have an hourly frequency for my datafeed.
Everything seem to work correctly, however sometimes for a certain reason some data get skipped.
The ML job throws an anomaly. When I look at the count everything is normal, but the anomaly still shows actual 0

Looking at the view series, I have:

You can clearly see that the data count is not 0, but 8.

I have a 5min query_delay, but I checked at what time the data has been inserted, and it was in the middle of the bucket_span.So for sure the data didn't get ingested after the ML processed it.
Any idea?

richcollier · June 1, 2020, 4:34pm

What version? This sounds similar to a bug that was introduced in v6.5, but fixed in v6.6. See: Machine Learning datafeed skipping documents that seem to be there

alexisdrnt · June 1, 2020, 4:39pm

ES and Kibana are running on 7.5.2

richcollier · June 1, 2020, 8:27pm

Hmm...odd. What do you see if you run:

GET .ml-anomalies-*/_search
{
    "query": {
            "bool": {
              "filter": [
                  { "term" :  { "result_type" : "bucket"}},
                  { "range" : { "timestamp" : { "gte": "now-3d" } } },
                  { "range" : { "anomaly_score" : { "gte": "90" } } }
                  ]
            }
    }
}

Specifically looking for this anomaly record's value of the field event_count

alexisdrnt · June 1, 2020, 8:52pm

When I run that I don't have any result

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

richcollier · June 1, 2020, 9:50pm

Ok, well that doesn't make sense unless the anomaly in your screenshot is gone, or the score has changed to be below 90 now. Perhaps verify that that this is or is not the case. If the score has changed then just adjust the "gte": "90" in the query

alexisdrnt · June 1, 2020, 10:10pm

The previous graph that I posted were related to record result_type

alexisdrnt · June 1, 2020, 11:04pm

After looking at the event_count, the Anomaly Detection job missed ~2000 documents.
I'm not sure how to determine the reason.

richcollier · June 2, 2020, 12:32pm

Ah, yes sorry about the mistake on the result_type.

So, as you may well know, the event_count per bucket is the number of events present in the index for the bucket_span when the datafeed's query is executed (and thus those are the documents passed along to anomaly detection). If you have occasions in which the event_count is less than the number of docs in that timeframe (viewed retroactively) then this really does point to an ingest delay issue - which is mitigated by increasing the query_delay.

alexisdrnt · June 8, 2020, 1:44am

Could it be another reason?
I increased the query delay, I checked the skipped documents: The ingestion time of theses documents happens in the middle of the bucket span. Anything else I can check?

richcollier · June 9, 2020, 3:46pm

May be a relevant detail here - but while the ingestion time does matter, however, what matters more is the timestamp field used in the index pattern. If the document has a timestamp of 2020-06-09T15:44:40.608000Z , but gets indexed 5 minutes later, the document will still be missed by the ML datafeed if the query_delay isn't big enough

richcollier · June 17, 2020, 7:17pm

By the way - this blog might be useful to those that are not sure just how much their ingest delay is:

system · July 15, 2020, 7:17pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Machine Learning datafeed skipping documents that seem to be there Elasticsearch elastic-stack-machine-learning	14	4183	April 9, 2019
Elaticsearch kibana ml jobs issue Kibana elastic-stack-machine-learning	11	473	November 17, 2020
No anomalies on live data Elasticsearch elastic-stack-machine-learning	6	647	October 19, 2018
Security Analytics Recipes Elasticsearch	6	1314	August 25, 2017
Intermittent transient anomaly at data's edge Kibana elastic-stack-machine-learning	5	554	February 25, 2019

Anomaly Detection Kibana skipping data

Related topics