I'm trying to run some machine learning jobs but I getting problems with their datafeeds seemingly skipping documents that everything else indicates are there to process when the job is running.
The datafeeds run on indexes that update every 15 minutes. The bucket interval is 1 hour, the frequency is 1 hour (equal or multiple of the bucket interval), and the query_delay is set at 35 minutes (enough time for two ingestion events between job runs). The index refresh is at the default 1 second.
When the datafeed is started for the first time, everything works great. As soon as it hits the 35 minute query delay for the most recent bucket though, the datafeed reports it can't find any indexed documents and reports 99 severity anomalies due to 0 document count.
If the job is left to run, the Machine Learning Job Management row has a warning with "Documents Missing due to Ingest Latency" which also gets annotated onto the Single Metric Viewer visual in multiple places since the "switch over" between historic and live data.
"Datafeed has missed N documents due to ingest latency, latest bucket with missing data is [timestamp]. Consider increasing query_delay"
However, even right at the moment the job runs, I can see documents in "Discover" are definitely are there. I followed the advice from another thread and created a Watcher which queries the document count of that index every 5 minutes to "prove" they're really there and it sees the documents are there. If I immediately stop the datafeed right after the job runs, and recreate it, then the datafeed also sees the documents are there and doesn't report the same 0-value anomalies it literally just reported.
Whenever I set this datafeed to "live updates" though, all I get is 0s across the board when it runs, even I put the query delay at 2h which should be long, long, long after ingestion is finished.
To make things even stranger, this job worked fine in 6.4 without managing any of these delay settings, I just set the bucket interval and left everything else on default and it worked.
Why is my datafeed constantly reporting 0 data only when it's doing a live update but no other part of Kibana believes those documents aren't there?
The anomaly:
Discover:
Watcher query:
"query": {
"bool": {
"filter": [
{
"range": {
"click_datetime": {
"gte": "now-2h-35m",
"lte": "now-1h-35m"
}
}
}
]
}
}
And results of the watcher:
"hits": {
"hits": [],
"total": 15273,
"max_score": 0
},
I've let the job run for a few cycles and here are the results:
The data from that first auto-run always comes back zero. Previously I've turned the job off at this point I'm going to let it run overnight and see if maybe it only happens on the first auto-run and can be ignored going forward.
The annotation in that screenshot was auto-added and says this. 100% those documents were there more than 2 hours before this job ran.