Conflicting data between Index Dashboard and ML anomaly detection job

Hello All,
I am trying to create a watcher to detect a high volume of error records. When I check through the index it shows that there are no Data for Dec 20th and Jan 10th, however, the watcher detected that there are anomalies on Dec 20th and Jan 20th. Both of them have the same Json query

This is from the index data

And this is from the Anomaly Job

I am checking for a high count, is there anything that I am missing that gives the contradictory data?

Thank you

Is your first screen shot using discover to view the data for that time range? What index pattern are you viewing in discover? Is your anomaly detection job using the same index pattern? How is your anomaly detection job configured?

@Nathan_Reese Yes, both of them use the same index. I am using the single metric with distinct count field,
both of the discorvery tab and anonoly job have the same JSON query

      {
      "bool": {
        "must": [
          {
            "term": {
              "@source_host": "host"
            }
          },
          {
            "term": {
              "app.code": "app"
            }
          },
          {
            "bool": {
              "should": [
                {
                  "wildcard": {
                    "@message": "*Error*"
                  }
                },
                {
                  "wildcard": {
                    "@message": "*Caused*"
                  }
                },
                {
                  "wildcard": {
                    "@message": "*failure*"
                  }
                },
                {
                  "wildcard": {
                    "@message": "*exception*"
                  }
                },
                {
                  "wildcard": {
                    "@message": "*bad request*"
                  }
                }
              ]
            }
          }
        ]
      }

The Actual and Typical counts from the ML anomalies data do not look like they come from the same data set as your Discover screenshot. Could you send the full screen shot of Single Metric Viewer with the Chart and Anomaly records? And include the ML Job configuration detail too?

Hello @grabowskit,
I cleared cache, and now it gives more reasonable data and there are no anomalies, However, if I check at the data from the single metric viewer and the discovery board, it still gives me conflicting data.

This is from the discovery board that says there are 4 counts on Jan 4th

And this is from the single metric view which states that there are 17 counts on Jan 4th

Maybe there something in the settings that I am missing? for the anomaly job it is set for 1d

Here is the job config

There are a couple of things I can think of that might cause this discrepancy:

  1. Your ML detector is distinct_count, but your kibana Discover is showing the raw count of docs that match a query. Those aren't necessarily the same.
  2. The query used in kibana Discover may not be exactly the same one you're using for the ML job's datafeed
  3. You chose a bucket_span of 1d. This is an unfortunate choice for one specific reason - that when bucketizing the data to a day, ML will ALWAYS choose the start and end of the day to be midnight UTC timezone - not YOUR timezone. So, if you are UTC-5hrs for example, your kibana will think that you are looking at midnight-to-midnight, but you are really looking at 7pm-7pm UTC. Therefore, ML's notion of a day is different than yours and therefore the counts could be different, per day.

Changing the bucket_span from 1d to 1h seems to fix the issue. Thank you for your help :slight_smile:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.