Datafeed has missed documents due to ingest latency error

Hello,
I have many of these in the Job Management > Job Messages tab,
and I'm looking for a way to check it.

first of all, the dates don't make since, but I guess it's a timezone bug in Kibana (first column's date is lower than the date in the message itself)

second, is there any way to validate ingestion time? I don't think that we have a 30 minutes delay - current query_delay setting in the datafeed (It still may be true though...)

Hi @liorg2

The left hand timestamp in the job messages tab is displayed in your browser timezone by default. You can change this in Kibana Management / Advanced Settings (although I would not think you'd want to as it affects all your visualizations too).

The backend anomaly detection uses UTC as this is a log message from the backend. This log message is a string, rather than a field with a data type of timestamp, so it is unformatted.

The time reported in the message is the start of the bucket, and the end of the bucket is this start time plus the bucket_span configured for the job.

This recent blog describes how you can add an ingest time to your ingest pipeline. https://www.elastic.co/blog/calculating-ingest-lag-and-storing-ingest-time-in-elasticsearch-to-improve-observability

This uses the set ingest processor https://www.elastic.co/guide/en/elasticsearch/reference/current/set-processor.html

More information on delayed data can be found here https://www.elastic.co/guide/en/machine-learning/7.9/ml-delayed-data-detection.html

Hope this helps
Sophie

thanks for the information!

just few yes/no questions... currently I don't have any injection processor,
will it make a load on the cluster? do I need to add nodes for that? (I use elastic cloud..)

thank
Lior

Hi

You need a node with an ingest role i. You should have one already in cloud.

# GET _cat/nodes
127.0.0.1 70 80 1 0.00 0.03 0.05 dlm   - node1
127.0.0.1 46 80 1 0.00 0.03 0.05 dlmt  * node2
127.0.0.1 63 80 1 0.00 0.03 0.05 dilmt - node3

With respect to load, please benchmark locally to see if there is a possible performance impact. You don't really need the painless script part from the blog example, and I doubt that the single set processor will make a noticeable difference - but it depends on your data and use case.

Before you change your ingest, as a first step you could also try a manual validation that you have delayed data by running a search that will replicate what the delayed data check is trying to achieve. Assuming you have a 15m bucket_span and a 30m query_delay, create a date histogram search e.g. count of events every 15m from now-90m say. Manually refresh this periodically over the course of the next 90m and see if the counts change as time elapses. Pay particular attention to the counts from time buckets that are greater than 30m ago. If these are changing, this suggests an ingest latency.

Make sense I'll check that
Thanks a lot