I am working with the machine learning tools provided by elastic. I am detecting certain rare events over time. But now I have certain fields that I want the model to take in consideration while detecting the anomaly. I read about influencers and have bit of a confusion that is the influencer only there so that we can blame them when an anomaly is detected or these influencer fields plays a role in detecting the anomaly. So if I configure a rare wizard job without influencers and a rare wizard job with influencers, will both of the jobs give me the same result ?
Or the job with influencer fields will take those fields into consideration and then detect the anomaly?
Influencers do not change whether an anomaly is detected.
Like you say, they exist to help attribute blame if an anomaly is detected.
So can we have two by_field's like detect rare things by field_one by field_two? I want my anomaly detected to be influenced by two other fields. I read about over field and partition field but I am not sure these will work in my scenario.
You cannot have two "by fields", but you can have a "by field" and a "partition field" to effectively have a double-split.
However, the "by field" in the context of the
rare detector function actually means "the field that I want to find a rare value in" - so it is effectively not a split.
If you do a configuration like
rare by process partition=host you are basically saying "find me a rare process on a host, but treat every host independently (that is, do not have a global list of rare processes for all hosts).
over field invokes population analysis. See more info here: Temporal vs. Population Analysis in Elastic Machine Learning | Elastic Blog