Ml detector with exclude frequent enabled

saraKM · July 16, 2020, 5:19am

Hi,

I have two jobs (Each job has two detectors with max and high_mean functions) working on same data and exclude frequent enabled for one. I have seen some high severity anomalies raised when exclude frequent is none, but nothing is detected with "all". So just wondering how these ML algorithms are defining frequent entities and changing the scores?can we expect those frequents are eventually ignored by first ML model (frequent none) later in data processing?

BenTrent · July 16, 2020, 11:26am

@saraKM, is the job using a by field or an over field for handling entities?

It could be that the only anomalous entities are the ones that are occurring frequently. If entity's correlate strongly with the detector metric, it might be that only very frequent entities have anomalous high_mean values.

richcollier · July 16, 2020, 2:45pm

Additionally, the introduction of Filter lists basically obviates the need for the exclude_frequent setting, which pre-dated Filters. With filters, you have specific control over which entities you'd like to omit from anomaly creation.

Tom_Veasey · July 16, 2020, 3:01pm

Although, you can have fine grained control over the exact field values you exclude with Filter lists and this may often represent the right choice, they do require some manual configuration and ongoing maintenance. There is also nothing to stop you using both a Filter list and exclude frequent. Note that setting the value to "none" simply means no values are excluded.

Exclude frequent is most likely to be useful in contexts where you know that frequently occurring events are not of interest. In this context, frequent means generates values in a significant fraction of time buckets. So whether a field's values are excluded is a function of the job's bucket length.

Assuming exclude frequent fits your needs, I would recommend them mainly in conjunction with a population analysis. For example, you might want to look for unusually high values of x for each entity, but ignore entities which are always active in the system you're observing.

saraKM · July 16, 2020, 11:30pm

Hi Ben, both detectors are using "by" and "partition" fields (low cardinality, less than 10) and bucket span is 15 minutes. I agree that they might all be frequent, but was confused why they are detected as high severity if they are frequently happening. As @richcollier and @Tom_Veasey suggested I might go with a mixed model approach.

thanks

saraKM · July 16, 2020, 11:40pm

Hi Tom,

Thanks for suggestions. For population analysis, you mean having another detector for population without exclude (in conjunction with two existing detectors with exclude enabled)? or just having a population analysis with exclude enabled? In first case does that help to decrease the severity at influencer level? I might be checking record level results which I assume are not impacted with other detectors results?

Thanks

system · August 13, 2020, 11:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ML Anomaly Job with exclude_frequent option Elasticsearch	2	198	December 28, 2023
Machine learning options Elasticsearch elastic-stack-machine-learning	2	798	November 27, 2020
Machine Learning: Rare function not working as expected Elasticsearch elastic-stack-machine-learning	4	1034	August 7, 2017
Anomalie Detection : Need Help Please Kibana elastic-stack-machine-learning	3	341	July 20, 2021
Needing Help Determining Suspicious User Activity using freq_rare detector Kibana elastic-stack-machine-learning	6	599	September 3, 2019

Ml detector with exclude frequent enabled

Related topics