I am creating Machine learning jobs in real time. The questions are:
What kind of information Latest Timestamp provides?
and
Why the Latest Timestamp remains the same as it was in the first day I created it, even though the latest timestamp has changed inside the graph?
I
When you say that the latest timestamp is changing in the Single Metric Viewer, are you referring to the time picker in the top-right or are you seeing results in the chart with a newer timestamp than what you see in the job list?
Here it says March 12th 2021 but in the overview in the Latest Timestamp had a very old date.
I increased the query_delay and it seems to be better, however now I have (random)warnings saying that "Datafeed has missed a number of documents due to ingest latency ".
We ingest data every day at random time and I put the delay to 1d but then again I got the warning.
I don't really understand how it works.
How bigger should the query_delay be in order to avoid those warnings?
In short, query_delay is what lags the entire job behind "real-time". If you only ingest data once per day then indeed, you will need to lag your job with a query_delay of at least 1 day. It also matters what the bucket_span of your job is.
Keep in mind that the anomaly detection jobs can either be running in real-time (with a delay, of course) or they could be invoked periodically (with a script that hits the datafeed API with a start and end time, for example) to process previously ingested documents.
What you DON'T want is for the Anomaly Detection job to search for data in the ES index for a certain time range, but have no documents in the index because they are not ingested yet.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.