How to get all relevant data of anomaly into alert message

I am trying to write a python script in order to find anomalies and relay them forward into our monitoring system.

What I am looking for is to get all the same information I can find from anomaly explorer (in the picture):

But after a couple of days of trying, I just cannot get it right.

What I have done, is that I have a single ML job which uses "customer"-field for partitioning the data. The function is "high_count by keywords over username partitionfield=customer".

If I have understood correctly, I should first search buckets, which has anomaly_score greater of equal to 75 (critical), which would give me a timeframe when at least one anomaly happened.
Then I would query all records and influencers from that timeframe, and I would get anomalies to be sent forward.

But my problem is that I don't know how to partition the data properly, as the bucket doesn't seem to have the information which customers data caused the anomaly. If I have understood correctly, it the bucket only tells the timeframe. So if I would query the influencers from that timeframe, I would also get other customers influencers and the data would get mixed. It would be trivial if every customer has their own ML job with separate indices, but I would like to have a single job for this thing.

Somehow the anomaly explorer gets it right. Can someone explain to me how it is done there?

In the .ml-anomalies-* index, you need to query for result_type:record in order to get the detail that you see in the screenshot

Thank you for your reply!

I thought that I can get the anomaly score from the bucket only, so can you elaborate a bit how I could calculate it myself using single records?

So the buckets are only to make querying faster (it functions as aggregated results from single records?), or is there another usage?

If single records contains influencers too, what is the function to have them also separately?

Thank you for taking the time to open the logic behind machine learning!

Take a look at this blog - it will explain a lot:

Thank you for your help, I think I got it now!

I have a few additional questions:
In result_type: records documents, there is a "causes"-array. In that array, there are two arrays called "typical" and "actual". Is there an actual use-case why they are arrays and not just single value variables?

Same question goes to "influencers"-array, if the "influencer_field_name" is a single string variable, what would be the case where "influencer_field_values" would have multiple items?

For example, in my case it looks like this:

"influencers": [
{
"influencer_field_name": "username.keyword",
"influencer_field_values": [
"backupuser"
]
},
{
"influencer_field_name": "keywords.keyword",
"influencer_field_values": [
"Audit Success"
]
}
]

But what if there would be another user in addition to "backupuser" and another influencer field value in keywords.keyword, let's say "Audit Failure". How would I know which user had which anomaly?

Cheers!

In the causes array, the reason why typical and actual values for each "cause" is an array rather than a single value is because if you are using the lat_long function, there are two values for each (a latitude and a longitude).

And, of course, within the causes array, there may be more than one instance of a "cause", thus requiring the whole thing to be an array.

As for influencers, there can be more than one "influencer" and yes, there can be more than one influencer_field_values for a given "influencer".

To help illustrate this point, here's an anomaly record for a job that is count by status over clientip with influencers (clientip, status, uri):

      {
        "_index": ".ml-anomalies-shared",
        "_type": "doc",
        "_id": "gallery_record_1499781600000_900_0_422950106_13",
        "_score": 0,
        "_source": {
          "job_id": "gallery",
          "result_type": "record",
          "probability": 2.835781239094336e-7,
          "record_score": 50.30613,
          "initial_record_score": 88.68071046303554,
          "bucket_span": 900,
          "detector_index": 0,
          "is_interim": false,
          "timestamp": 1499781600000,
          "by_field_name": "status",
          "function": "count",
          "function_description": "count",
          "over_field_name": "clientip",
          "over_field_value": "xx.157.32.164",
          "causes": [
            {
              "probability": 0.0000028967136719120617,
              "by_field_name": "status",
              "by_field_value": "304",
              "function": "count",
              "function_description": "count",
              "typical": [
                1
              ],
              "actual": [
                272
              ],
              "over_field_name": "clientip",
              "over_field_value": "xx.157.32.164"
            },
            {
              "probability": 0.003987181891731438,
              "by_field_name": "status",
              "by_field_value": "200",
              "function": "count",
              "function_description": "count",
              "typical": [
                8.986571392274634
              ],
              "actual": [
                238
              ],
              "over_field_name": "clientip",
              "over_field_value": "xx.157.32.164"
            },
            {
              "probability": 0.015983024954163913,
              "by_field_name": "status",
              "by_field_value": "302",
              "function": "count",
              "function_description": "count",
              "typical": [
                2.117091377538835
              ],
              "actual": [
                11
              ],
              "over_field_name": "clientip",
              "over_field_value": "xx.157.32.164"
            }
          ],
          "influencers": [
            {
              "influencer_field_name": "status",
              "influencer_field_values": [
                "304"
              ]
            },
            {
              "influencer_field_name": "clientip",
              "influencer_field_values": [
                "xx.157.32.164"
              ]
            }
          ],
          "clientip": [
            "xx.157.32.164"
          ],
          "status": [
            "304"
          ]
        }
      }

Notice the entire record is oriented around the "over_field_value": "xx.157.32.164"

Hope this helps

if you are using the lat_long function, there are two values for each (a latitude and a longitude).

Makes perfect sense, didn't think of that!

Notice the entire record is oriented around the "over_field_value": "xx.157.32.164"

Of course, now I got it.

Thank you @richcollier for your helpful answers!

1 Like