ML - Anomaly Detection - Weight Rare against Entity AND Population

Hi All,

I'm attempting to find a way to find anomalies within some data, but I'm having some trouble understanding the correct way to go about it.

I want to effectively determine the weight of an anomaly by comparing the data against both the entity and the entity's population.

Example:

If I have user login data, I want to detect when a user logins in from a rare source (e.g.: Country), but I want the weight anomaly value to do something like:

  1. If login source is not rare for the user, and not rare for the population, then its not an anomaly
  2. If login source is rare for the user, but not rare for the population weight the anomaly "lower"
  3. If login source is not rare for the user, but rare for the population weight the anomaly "lower"
  4. If login source is rare for the user, and rare for the population weight the anomaly "higher"

I thought about using a detector like:

rare by "source.geo.country_name" over "user.name"

But my understanding of the above, is that this will only compare the user against the population and not the user against the user.

Is the above a correct understanding? And if it is correct, is it currently possible to achieve what I am looking for?

I'm looking at testing out some Elastic Security (SIEM) stuff, if this helps at all with the context.

Let's compare these three configuration options

  1. rare by "source.geo.country_name"
  2. rare by "source.geo.country_name" partition_field="user.name"
  3. rare by "source.geo.country_name" over "user.name"

Number 1. considers all country occurrences historically and compares them

Number 2. considers countries for each user separately but is otherwise similar to 1,

Number 3. considers users generating each country and looks for ones that few users generate. For 3 as well we are sensitive to users with multiple rare countries at the same time.

Also, along similar lines, when combined in one job we boost anomalousness if a country is rare overall, rare for a user, and rarely generated by any user. (assuming one uses user.name as an influencer throughout)

So you can consider configuring multiple detectors in a single job or leave them as separate jobs and combine the results in some custom way using a Watch (somewhat of an example of combining results across jobs here but you might want to use influencer_scores instead) or a Transform.

Note: rare by country over user needs a reasonably high population count, i.e. it won't be effective if you only have a few users

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.