How elastic ML detect anomaly?

Hi Community,

In above screenshot the upper bound values is between 1 to 4.8 something

On dec 12th at 9:15 it detect major anomaly it actual value was 3 , upper bound value was 0.754 and lower bound was 0

so in past data we have these values , why it consider as major anomaly not minor anomaly

Hard to tell what is fully going on here from just a few screenshots but it seems from the first screenshot the point which is marked as point A is inside the expected bounds (the blue shadow) whereas point B is not.

Now, what obviously isn't possible to know from the screenshots is why the expected bounds has this hump shape at this time.

Can you explain that the use case is (i.e. what the data represents) and what your detection configuration is for the ML job (i.e. what function is used, etc.)

Hi @richcollier

Here is the Job config

"analysis_config": {
    "bucket_span": "15m",
    "detectors": [
      {
        "detector_description": "rare by \"resp.errCode.keyword\"",
        "function": "rare",
        "by_field_name": "resp.errCode.keyword",
        "detector_index": 0
      }
    ],
    "influencers": [
      "payer.name.keyword"
    ]
  },

There is a field called error code and i am using rare function. The purpose of applying this function is that we want to detect which error code happened very rarely.

Hi @richcollier
can you please help me to understand .

Can anyone tell me what this ml graph depicting.

I'm a little dubious about whether the previously shown graph is for the rare by resp.errCode.keyword job you provided the configuration for.

This is because jobs using the rare function don't have model bounds shown with expected values as rare values don't exist routinely enough to plot a predicted value. See:

Can you provide another screenshot showing the job name and detector, as mine shows? I truly suspect the previous screenshot was from a job that was modeling count , not rare.

Hi @richcollier ,

Sorry i just posted wrong ML snap
This is the actual ML Job config

"analysis_config": {
    "bucket_span": "15m",
    "detectors": [
      {
        "detector_description": "max(test) by \"verb.keyword\"",
        "function": "max",
        "field_name": "test",
        "by_field_name": "verb.keyword",
        "detector_index": 0
      }
    ],
    "influencers": [
      "verb.keyword"
    ]
  },

I am detecting max value for a particular service(verb.keyword)

I have recreated the ML job because earlier it was deleted mistakenly by me

Previous Pattern

Today's Pattern

One thing i have noticed that for Previous Pattern snap is different which i posted 2 weeks back. Does pattern revised if we delete the job ?

I have recreated the ML job because earlier it was deleted mistakenly by me

Previous Pattern

Ok, that makes more sense. Thanks for clarifying.

One thing i have noticed that for Previous Pattern snap is different which i posted 2 weeks back. Does pattern revised if we delete the job ?

If you run the same data (and amount of data) through the same job configuration, you will get the same results. If you send less data, more data, etc. you will get different results.

In your screenshots above, I'm now lost as to what you're trying to ask. In the one labeled "Previous Pattern" shows results from early in the analysis/learning cycle (remember data is processed in chronological order so data from 12-10-2022 is "early") - so therefore, the predictions and anomalies will be less accurate/meaningful than those seen later in the data (around 12-30-2022, for example). This is why the blue shading more accurately follows the expected ranges of the values of the data in late December than early December - as ML has had more data to discover the natural pattern of the data.

Is this what you were wondering?

I have two question what + indicate in many blogs and documentation it say it is multi-bucket can you tell me what actually it is by referring my snap and second there is annotation also but there is no reason for annotation

If you read the blog Interpreting multi-bucket impact anomalies using Elastic machine learning features | Elastic Blog you will see that the + icon indeed indicates that the anomaly has a significant multi-bucket factor to it - meaning that it's not just the individual point that may be anomalous, but there are a series of consecutive points that are anomalous together (derived from a sliding window of 12 bucket spans).

As for your other comment about the annotation - what does the annotation say?

@richcollier Thank you for really sharing your knowledge but in machine learning i m just stuck at shaded part that is trend or model.Sometime shaded blue will shrink or wider does it tell something or is there any prior concepts should i know to depit this waves in graph.

As for your other comment about the annotation - what does the annotation say?
it just say "Change in trend"

Yeahh but really thank you.

Data is processed in a streaming manner in chronological order. So, the data on the left hand side is modeled very coarsely until the time where you see annotations 2,3, and 4 on the picture. This is when the model recognizes the up-and-down "sawtooth" trend and the range of expected values (as depicted by the blue shading) more accurately follows the data.

This is a very typical thing that happens if and when the data you're modeling with ML detects a periodic trend. See another example:

You should know that you shouldn't really "trust" the results/anomalies that are flagged early in the analysis (way on the left-hand side). You should ensure that a least a few days (and maybe as many as 3 weeks) of data is processed by the ML job so that all typical periodic trends (hourly, daily, weekly) can be properly identified.

If you want to learn more about ML, pick up a copy of my book:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.