ML : detects unusually low number of users

Hello ,
I'm using machine learning detector on version 7.17.
I would like to detect unusually low number of users.
I m using for that low_distinct_count as function.
It works fine when at least there is at least one user .

  • if the number of user goes from 17 to 1 , this decrease is well detected
  • if the number of user goes from 17 to 0 , this decrease is NOT detected

Are there any condition to add to the ML job so we can detect event if all users are lost?


This, in fact, should work as you are expecting it to. I'd love to see evidence (like a screenshot) of the situation that you describe with it not working! Please post here!

Here the two cases

  • it works when at least one user is present

    -it does not work there is no users

    as you can see in the second graph detection of the decrease starts on 28/10 rather then the 26/10 like the first one.

FYI , I'm using a detector like " low_distinct_count(ID) by XYZ "

I stand corrected - it does ignore empty buckets, which is a little counter-intuitive to me but apparently, that's how it was designed. I've asked dev to consider making a feature enhancement to make the behavior optional (like we do by having count and non_zero_count function variants).

In the meantime, this can be accomplished via a workaround.

  1. Use aggregations in the datafeed to calculate the cardinality of your field of choice. See examples here: Aggregating data for faster performance | Machine Learning in the Elastic Stack [8.10] | Elastic
  2. Use the low_sum detector function on the aggregated field name.

Ok ,
Low_sum is not adequate since that the ID is not agregable field.
I will try the agregation on the datafeed

I got it to work via the cardinatlity agg and the low_sum function:

job config:

Note the name of the cardinality agg (here it is dc_airline) is the same as what's used in the detector definition (low_sum(dc_airline)) and the value of the summary_count_field_name

Thanks for your answer ,
I didn't succeed to have the expected result.
I have done the same thing , But I need to add a field XYZ as influencer:

in the result of the job I Don't have XYZ :

If you intend to split by using a by_field or a partition_field then your datafeed query has to also include a terms aggregation so that you get a service_cardinality value for every XYZ.

I suspect you don't have that at the moment

See example

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.