ML X-Pack inconsistency in records_output vs Kibana UI

machine-learning

(Rohithnama) #1

Hello Again!

I encountered an uncertainity in the number of records processed through the ML X-pack API for the following job.

I have the following questions:

What is the difference between last_date_time and last_record_time_stamp? The documentation says that it is the last time data was analyzed and the last time number of records processed. But it wasn't clear for me.

Is it indicating that data is available but it wasn't yet processed through the Ml Algorithm yet? If yes, it contradicts the following output.

The kibana UI indicates that last time stamp for the processed records was 2017-09-19 but the following records_time stamp indicates result_records are available for 2017-09-20 as well. How is it possible?

{
"job_id": "59c2f660c5aab6d6cb6d3bce",
"result_type": "record",
"probability": 4.65238e-23,
"record_score": 92.4073,
"initial_record_score": 92.4073,
"bucket_span": 86400,
"detector_index": 0,
"is_interim": false,
"timestamp": 1505952000000,( 2017-09-20)
"function": "sum",
"function_description": "sum",
"typical": [
14.2096
],
"actual": [
0
],
"field_name": "Amazon"

This is the similar scenario for multiple jobs. Please help me to get more clarity on how things are working here.

Thanks for your time in advance.


(rich collier) #2

last_data_time is essentially the last time the datafeed ran and successfully received data from the query to the index from the raw data..

You can see in this example, the index I have the ML job pointed at only has old, historical data in it:

You can see above that my last_data_time reflects the time at which I ran the job!

Your job also has a "!" sign next to Job messages. If you want to post questions here it would be helpful to not omit details about job warnings/messages that may be occurring.


(Rohithnama) #3

Thanks for response Rich.

So, the last_data_time reflects the time at which the job was ran and latest_record_timestamp indicates the most recent time_stamp among the processed records.

However, in my case, the timestamp of the output record is greater (by one day) than the latest_record_timestamp, how is it possible? Doesn't it indicate that the job also processed the records after the latest time-stamp shown in kibana UI?

I will post all the errors messages from now on- thanks for the hint.

This is the updated message info, in case, if needed!

Rohith


(David Roberts) #4

This piece of your original post contains the answer:

We are detecting anomalies that the sum of the value you are analyzing is zero, which is anomalously low. sum is one of the functions that can detect anomalies when time passes without seeing any data; count is another. Most functions, including mean, min and max will only detect anomalies in buckets where we see input data. If you want to detect anomalies in sums, but never raise anomalies for buckets with no data then the function you need is non_null_sum.

The job messages show that the datafeed was warning you that it had seen no data for a long time. Seeing time periods with no data has always been a dilemma for us - I remember us agonising over the same question in the early days of Prelert in 2009! Does a period with no data represent an anomaly or a failure of the data gathering process? Without additional information the answer is that it could be either.

It looks like since you made the original post your datafeed has started seeing data again.

If you want to run a job that's similar to what you have now but does not raise anomalies when there is no input data then clone the job you have now and change the function to non_null_sum before saving the cloned job.


(Rohithnama) #5

Thanks for the response David.

I'm running the job for with one day as a bucket span, hence it could be displaying so. I think the issue could be with the data gathering process but I'm seeing the results within the expected bucket span. Please let me know what the info you need and I will be happy to provide.

Btw, the below data point doesn't indicate null, I'm passing the value zero when there is no revenue made on that particular day- we want it to be treated as zero.

After seeing the result, I can say that the anomaly results are correct and in sync with the previous results. However, my point is more inclined to UI. The latest_record_time_stamp doesn't indicate that the data point at "timestamp": 1505952000000,( 2017-09-20) is processed but the querying the results shows the resulting anomaly. I see some discrepency here and I'm unable to debug.

Please pass on some inputs to understand the root causes!

I would love to know more about the issues associated with data gathering as I can see this (!) mark for all my jobs irrespective of the bucket_span. Any inputs would help!

Thanks for your time and effort.


(rich collier) #6

On the contrary, I believe the UI is correct. With a bucket_span of 86400s (1 day) and a anomaly record with a timestamp of 1505952000000 (Thursday, September 21, 2017 12:00:00 AM GMT), this means that the bucket analyzed was between Wednesday, September 20, 2017 12:00:00 AM GMT and Thursday, September 21, 2017 12:00:00 AM GMT and this record was written moments after the bucket closes at Thursday, September 21, 2017 12:00:00 AM GMT.

by your screenshot:

When the record was written at that wall-clock time of moments after Thursday, September 21, 2017 12:00:00 AM GMT, the datafeed was not seeing new data and it didn't start seeing that new data until 06:27AM on the 21st.

Thus, the values seen make sense.


(David Roberts) #7

I think it would be good to confirm the following things:

  1. Which timezone is the browser running in? (Most likely the same as which timezone your laptop or desktop is set to.)
  2. Which timezone is the Kibana date format preference set to?
  3. Which timezone is the Elasticsearch cluster running in?
  4. What is the frequency setting for the datafeed?
  5. What is the query_delay setting for the datafeed?
  6. At what time each day is the document containing your data for the previous day indexed into Elasticsearch? (Please also state which timezone the answer to this question is in.)
  7. Which version of the Elastic stack are you running?

Based on what I'm seeing in the screenshots, I'd guess the frequency for the datafeed is set to 150 seconds. But based on record counts it looks like there is only 1 input document available per day? If so, there is no point having such a quick frequency.

The timezones are also interesting. We run our daily maintenance task at 00:30 in the timezone of the Elasticsearch cluster. But that's showing up as 01:30 in the UI. This could be explained if for example your Elasticsearch cluster is running in one timezone but Kibana is displaying times in the timezone that is one hour ahead.

We have had bugs in the past where certain times have been displayed in the wrong timezone. If you could confirm the answers to the questions above we may be able to detect another place where the UI is displaying times in the wrong timezone.


(Rohithnama) #8

Thanks for the response David.

  1. The timezone browser running in is- EST
  2. I didn't set-up any specific timezone for kibana, so it could be taking the default one- UTC
  3. Should be default for elasticsearch as well- UTC
  4. "query_delay": "100s",
  5. "frequency": "150s"
  6. I created the job at 2017-09-22 16:38:03 as per the UI, and it could be in UTC. I don't exactly remember what time I created.

Using the API we developed, we need to support jobs for a shorter as well as longer bucket-span so I had to configure it very low but I can change it as per the bucket span, if it could be an issue.

The discussion is turning to be more interesting. Thanks for helping me to debug with a great detail.


(Rohithnama) #9

This seems to be the possible case. The resulting record could have been generated within that timespan and I got the notification for the anomaly record.

But, when the resulting record was generated, doesn't it needs be seen in the UI as well? It is not the case here.

But the time-stamp 1505952000000 seems to be of date- Wed 20 September 2017
20:00:00 GMT ()not Thursday, as you mentioned). So do you think the data was analyzed between 20th and 21st and the data indexed on 20th was analyzed and then the results were sent after the analysis. Can you correct me on this, if I'm wrong?

This seems to be a possible case as per my configuration. But the issue is bothering me.

Btw, even though the data has been retrieving, the UI wasn't really being updated. So the issue could be within my configurations or some thing else from my end. Could you please check the following screenshot and throw some insights.

The data for the past few days (after sept 19th), which makes it a total of 330 data points, is available in the index but the job is not detecting in the UI and I'm not sure if it processed all these records.

But as we can see the data for 20th was analyzed as we have seen the anomaly but wasn't really sure about the later dates. Because when the job runs it should take the last time stamp and run till the current one, but that doesn't seems to be happening here.

What could be the possible issue and am I missing something?


(rich collier) #10

Incorrect, it is Thursday, September 21, 2017 12:00:00 AM - which would be 20:00 on the 20th only in the EST time zone (using https://www.epochconverter.com/):

If an anomaly record is written in the the index, it is available in the UI, because the UI merely queries the index into which the anomaly record is written.

You can also always check to see what the latest results records are using the API:

GET _xpack/ml/anomaly_detectors/<jobname>/results/buckets
{
  "desc": true,
  "page":{
    "size": "1"
  }
}

(Rohithnama) #11

Oh, yeah! Messed with timezones.

Using the query, I can see that all the records are being processed and I can even query the results.

So now the only question remains for me is, why doesn't it reflect anything in the UI part- Either in terms of the latest_time_stamp or the number of records processed?

Even in the results section of the single metric viewer, it doesn't show the following anomaly record.


(Rohithnama) #12

Would love to know the reason for the difference in UI as well as the queried results. Just curious and also want to make sure this problem is resolved so that I don't encounter in the future!


(David Roberts) #13

We have a bug fix with the following description that has not yet been released:

Formats date/times in the ML pages for the timezone configured under the Kibana Management Advanced Settings page in the dateFormat.tz property. Previously date/times were always displayed in the local browser timezone regardless of the value of the dateFormat.tz property.

This includes times shown in:

  • chart axes
  • anomaly table cells and expanded rows
  • chart point tooltips
  • Jobs list

When you messed with the timezone for your UI, did you do it by changing the dateFormat.tz property in the Kibana Management Advanced Settings page? If so , you will have run into this bug.

Unfortunately, at present the ML pages in the UI only work properly when the Kibana timezone matches the browser timezone. This will be fixed in a future release.


(Rohithnama) #14

Thanks for the response.

No, I haven't modified anything in the UI- didn't change anything relevant to TimeZone as well. So I sense another bug in here but not entirely sure about the root cause though!


(system) #15

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.