Help: Create multi metric machine learning job

Hello everybody,

I need help to create a multi metric job.
The idea is that I want to detect users that connect from different countries, for example if a user used to connect from Germany, if one day he connects from France, I will receive an alert.
I tried to create the job like that:

metric: Distinct count(source.geo.country_code2.keyword) and Distinct count(source.nat.geo.country_code2.keyword)

split field: user.name

The result I got are like below, which is not the result that I want as it's not responding to my need

Could you please tell me how can I create this job.

Best regards

The likely thing you really want to do here is to leverage the rare detector function to find a country that is rare for a user

rare by source.geo.country_code2.keyword partition=user.name

See a similar example here: Dec 4th, 2018: [EN][ML] Rarity Analysis with Machine Learning

Just want to be cognizant of the cardinality of the user.name field. If it is really high you'll require a lot of memory utilization for the job.

Also, from your screenshot it seems like your data has empty string for some user names. You might want to filter those out in the datafeed query (??)

1 Like

Thanks for your help @richcollier,

Could you tell me please if this query to filter empty user.name field is correct and if I should add an influencer to my job as I am getting a warning that my job has no influencer !

{
  "bool": {
    "must": [
      {
        "exists": {
          "field": "user.name"
        }
      }
    ]
  }
}

My job Configuration:

I am getting an empty result, so don't know if there is a mistake in my configuration, or it's just no anomaly was found.

Best regards

I would make both user.name and source.geo.country_code2.keyword as influencers in your configuration.

As for your results, it's possible that you don't have an example of an anomaly in your data yet. Often, when testing, it is good to have the job learn on a good amount (weeks if possible) data, then contrive a situation (manually force the indexing of a sample document of a user connecting from a strange location).

1 Like

Thank you very much for this valuable information
My index has almost 2 months of data, I will try to generate later manually a connection from a rare country and see if the machine learning detects it

Thanks again :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.