ML - Anomaly Detection - Weight Rare against Entity AND Population

BenB196 · June 2, 2022, 8:52pm

Hi All,

I'm attempting to find a way to find anomalies within some data, but I'm having some trouble understanding the correct way to go about it.

I want to effectively determine the weight of an anomaly by comparing the data against both the entity and the entity's population.

Example:

If I have user login data, I want to detect when a user logins in from a rare source (e.g.: Country), but I want the weight anomaly value to do something like:

If login source is not rare for the user, and not rare for the population, then its not an anomaly
If login source is rare for the user, but not rare for the population weight the anomaly "lower"
If login source is not rare for the user, but rare for the population weight the anomaly "lower"
If login source is rare for the user, and rare for the population weight the anomaly "higher"

I thought about using a detector like:

rare by "source.geo.country_name" over "user.name"

But my understanding of the above, is that this will only compare the user against the population and not the user against the user.

Is the above a correct understanding? And if it is correct, is it currently possible to achieve what I am looking for?

I'm looking at testing out some Elastic Security (SIEM) stuff, if this helps at all with the context.

richcollier · June 3, 2022, 11:08am

Let's compare these three configuration options

rare by "source.geo.country_name"
rare by "source.geo.country_name" partition_field="user.name"
rare by "source.geo.country_name" over "user.name"

Number 1. considers all country occurrences historically and compares them

Number 2. considers countries for each user separately but is otherwise similar to 1,

Number 3. considers users generating each country and looks for ones that few users generate. For 3 as well we are sensitive to users with multiple rare countries at the same time.

Also, along similar lines, when combined in one job we boost anomalousness if a country is rare overall, rare for a user, and rarely generated by any user. (assuming one uses user.name as an influencer throughout)

So you can consider configuring multiple detectors in a single job or leave them as separate jobs and combine the results in some custom way using a Watch (somewhat of an example of combining results across jobs here but you might want to use influencer_scores instead) or a Transform.

Note: rare by country over user needs a reasonably high population count, i.e. it won't be effective if you only have a few users

system · July 1, 2022, 11:08am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Machine Learning: Rare function not working as expected Elasticsearch elastic-stack-machine-learning	4	1053	August 7, 2017
Anomalie Detection : Need Help Please Kibana elastic-stack-machine-learning	3	352	July 20, 2021
Help: Create multi metric machine learning job Elasticsearch elastic-stack-machine-learning	5	951	January 4, 2021
Xpack function lat_long not finding anomalies Elasticsearch	4	567	March 8, 2018
How to get all relevant data of anomaly into alert message Elasticsearch elastic-stack-machine-learning	7	1538	October 30, 2018

ML - Anomaly Detection - Weight Rare against Entity AND Population

Related topics