Detail questions regarding bucket and influencer scorings

kaihil · May 15, 2018, 3:56pm

I have two questions regarding your blog post on Machine Learning Anomaly Scoring and Elasticsearch - How it Works | Elastic Blog

Question regarding influencer scoring

Assume having a multi-metric-job. As I understand, the influencer score for a specific influencer entity (e.g. influencer: "country", entity: "spain") is somehow derived from the anomalies that occur in all timeseries' with country=="spain". I hope this is correct so far.
My question: Does it also take into account the number of timeseries with country=="spain" that are "clean" (i.e. not affected by anomalies in this bucket)? I am asking because it might be a big difference if a influencer entity affects 10 of 10 timeseries or 10 of 1000.

Question about bucket results

You write:

Note that the calculation behind the bucket score is more complex than just a simple average of all the individual anomaly record scores, but will have a contribution from the influencer scores in each bucket.

I don't really get it. Can you rephrase the sentence please? So do influencer scores contribute to the bucket result or don’t they contribute?

Thanks you very much in advance!

Kai

Peter_Harverson · May 16, 2018, 9:57am

Kai,

Thanks for the questions on scoring.

For (1) regarding influencer scoring, the short answer is yes, it does account for both the anomalousness and count of the events associated with each entity.

For (2) regarding the calculation behind bucket results, yes, the influencer scores do contribute to the bucket score. We will always generate an influencer score for "time" (which is a function of all anomalies present in that bucket), and then one score for each type of influencer field you use in your job i.e. region, person, etc. The score for the bucket is calculated from an aggregation of all of these scores.

To provide a bit more detail, aggregation is done on raw probabilities. We have three styles of aggregation which map, more or less, to the following concepts:

The chance of the joint event of all the probabilities,
The chance of an event with the single lowest probability, and
The chance of an event comprising the lowest n probabilities for small n.

We use a combination of these three styles when calculating the scores.

Hope that helps
Pete

kaihil · May 18, 2018, 8:58am

Alright. Thanks for your quick and helpful answer!

Topic		Replies	Views
How data for 'Overall' field in 'Anomaly Explorer' window is calcualted Elasticsearch elastic-stack-machine-learning	3	870	December 5, 2017
Machine Learning score explanation Elasticsearch elastic-stack-machine-learning	2	136	June 3, 2024
What is bucket_influencers? Elasticsearch elastic-stack-machine-learning	1	295	December 7, 2022
How do the influencers work or do they impact on the anomaly score? Elastic Training	3	1381	November 1, 2017
Problems understanding Anomaly Explorer Elasticsearch elastic-stack-machine-learning	2	1629	November 20, 2017

Detail questions regarding bucket and influencer scorings

Related topics