Datafeed missing in ML job

Hi all,

I always receive this warning on my jobs.

Datafeed has missed 2 documents due to ingest latency.

I have tried to increase the query_delay but still the same.
Any feedback?

Much appreciated!

Hello! I think there is some good applicable advice in Datafeed has missed documents due to ingest latency error

In particular, there are references there to

and Calculating ingest lag and storing ingest time in Elasticsearch to improve observability | Elastic Blog

Sophie also provided the following advice, which I think is applicable here too:

Before you change your ingest, as a first step you could also try a manual validation that you have delayed data by running a search that will replicate what the delayed data check is trying to achieve. Assuming you have a 15m bucket_span and a 30m query_delay , create a date histogram search e.g. count of events every 15m from now-90m say. Manually refresh this periodically over the course of the next 90m and see if the counts change as time elapses. Pay particular attention to the counts from time buckets that are greater than 30m ago. If these are changing, this suggests an ingest latency.

Hi,

If I understand correctly, missing 2 documents seems "normal" since it is just a small number?

Incorrect - ideally, you'd never want any documents to be missed. You need to either increase your query_delay so that you do not get missed documents or determine why your ingest pipeline is not keeping up with "real-time"

For the ingest pipeline, is it at logstash layer or elasticsearch? And could it be my logstash having a complicated ruby filter?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.