Hello everybody,
I need help to create a multi metric job.
The idea is that I want to detect users that connect from different countries, for example if a user used to connect from Germany, if one day he connects from France, I will receive an alert.
I tried to create the job like that:
metric: Distinct count(source.geo.country_code2.keyword) and Distinct count(source.nat.geo.country_code2.keyword)
split field: user.name
The result I got are like below, which is not the result that I want as it's not responding to my need
Could you please tell me how can I create this job.
Best regards
The likely thing you really want to do here is to leverage the rare detector function to find a country that is rare for a user
rare by source.geo.country_code2.keyword partition=user.name
See a similar example here: Dec 4th, 2018: [EN][ML] Rarity Analysis with Machine Learning
Just want to be cognizant of the cardinality of the user.name field. If it is really high you'll require a lot of memory utilization for the job.
Also, from your screenshot it seems like your data has empty string for some user names. You might want to filter those out in the datafeed query (??)
Thanks for your help @richcollier,
Could you tell me please if this query to filter empty user.name field is correct and if I should add an influencer to my job as I am getting a warning that my job has no influencer !
{
"bool": {
"must": [
{
"exists": {
"field": "user.name"
}
}
]
}
}
My job Configuration:
I am getting an empty result, so don't know if there is a mistake in my configuration, or it's just no anomaly was found.
Best regards
I would make both user.name and source.geo.country_code2.keyword as influencers in your configuration.
As for your results, it's possible that you don't have an example of an anomaly in your data yet. Often, when testing, it is good to have the job learn on a good amount (weeks if possible) data, then contrive a situation (manually force the indexing of a sample document of a user connecting from a strange location).
Thank you very much for this valuable information
My index has almost 2 months of data, I will try to generate later manually a connection from a rare country and see if the machine learning detects it
Thanks again 