Help: Create multi metric machine learning job

Abdelhalim · December 4, 2020, 3:35pm

Hello everybody,

I need help to create a multi metric job.
The idea is that I want to detect users that connect from different countries, for example if a user used to connect from Germany, if one day he connects from France, I will receive an alert.
I tried to create the job like that:

metric: Distinct count(source.geo.country_code2.keyword) and Distinct count(source.nat.geo.country_code2.keyword)

split field: user.name

The result I got are like below, which is not the result that I want as it's not responding to my need

Could you please tell me how can I create this job.

Best regards

richcollier · December 4, 2020, 4:12pm

The likely thing you really want to do here is to leverage the rare detector function to find a country that is rare for a user

rare by source.geo.country_code2.keyword partition=user.name

See a similar example here: Dec 4th, 2018: [EN][ML] Rarity Analysis with Machine Learning

Just want to be cognizant of the cardinality of the user.name field. If it is really high you'll require a lot of memory utilization for the job.

Also, from your screenshot it seems like your data has empty string for some user names. You might want to filter those out in the datafeed query (??)

Abdelhalim · December 7, 2020, 9:19am

Thanks for your help @richcollier,

Could you tell me please if this query to filter empty user.name field is correct and if I should add an influencer to my job as I am getting a warning that my job has no influencer !

{
  "bool": {
    "must": [
      {
        "exists": {
          "field": "user.name"
        }
      }
    ]
  }
}

My job Configuration:

I am getting an empty result, so don't know if there is a mistake in my configuration, or it's just no anomaly was found.

Best regards

richcollier · December 7, 2020, 2:13pm

I would make both user.name and source.geo.country_code2.keyword as influencers in your configuration.

As for your results, it's possible that you don't have an example of an anomaly in your data yet. Often, when testing, it is good to have the job learn on a good amount (weeks if possible) data, then contrive a situation (manually force the indexing of a sample document of a user connecting from a strange location).

Abdelhalim · December 7, 2020, 2:29pm

Thank you very much for this valuable information
My index has almost 2 months of data, I will try to generate later manually a connection from a rare country and see if the machine learning detects it

Thanks again

system · January 4, 2021, 2:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ML - Anomaly fresh data (high score) Elasticsearch elastic-stack-machine-learning	2	384	June 16, 2020
Question on how to create a simple ML job Elasticsearch elastic-stack-machine-learning	12	1117	October 29, 2018
How to create outlier jobs with data fields coming from multiple sources (log1,log2, metricbeat1, etc....) Elasticsearch elastic-stack-machine-learning	2	394	November 29, 2021
Creating multi metric job can only use distinct count on IP Elasticsearch elastic-stack-machine-learning	8	1172	March 5, 2018
Anomalie Detection : Need Help Please Kibana elastic-stack-machine-learning	3	325	July 20, 2021

Help: Create multi metric machine learning job

Related topics