I'm using machine learning detector on version 7.17.
I would like to detect unusually low number of users.
I m using for that low_distinct_count as function.
It works fine when at least there is at least one user .
- if the number of user goes from 17 to 1 , this decrease is well detected
- if the number of user goes from 17 to 0 , this decrease is NOT detected
Are there any condition to add to the ML job so we can detect event if all users are lost?
This, in fact, should work as you are expecting it to. I'd love to see evidence (like a screenshot) of the situation that you describe with it not working! Please post here!
Here the two cases
- it works when at least one user is present
-it does not work there is no users
as you can see in the second graph detection of the decrease starts on 28/10 rather then the 26/10 like the first one.
FYI , I'm using a detector like " low_distinct_count(ID) by XYZ "
I stand corrected - it does ignore empty buckets, which is a little counter-intuitive to me but apparently, that's how it was designed. I've asked dev to consider making a feature enhancement to make the behavior optional (like we do by having
non_zero_count function variants).
In the meantime, this can be accomplished via a workaround.
- Use aggregations in the datafeed to calculate the cardinality of your field of choice. See examples here: Aggregating data for faster performance | Machine Learning in the Elastic Stack [8.10] | Elastic
- Use the
low_sum detector function on the aggregated field name.
Low_sum is not adequate since that the ID is not agregable field.
I will try the agregation on the datafeed
I got it to work via the cardinatlity agg and the low_sum function:
Note the name of the cardinality agg (here it is
dc_airline) is the same as what's used in the detector definition (
low_sum(dc_airline)) and the value of the
Thanks for your answer ,
I didn't succeed to have the expected result.
I have done the same thing , But I need to add a field XYZ as influencer:
in the result of the job I Don't have XYZ :
If you intend to split by using a by_field or a partition_field then your datafeed query has to also include a
terms aggregation so that you get a
service_cardinality value for every
I suspect you don't have that at the moment