I do not really understand the difference between these two settings. They seem to perform the same function from my perspective.
This other discuss thread may shed some light: ML Kibana: difference between by_field_name and partition_field_name - #4 by richcollier
Alright thanks for the link. As I understand it, the partition_field_name is going to be a harder split in the model, then? So if I want the anomaly scores to be solely based on data matching the split field, I should use partition_field_name. And I should only use by_field_name if I want a softer split that is going to let data from the whole population affect anomaly scores.
Yes, that's pretty much it. Think of using partition_field_name
as practically the equivalent of N number of single metric jobs, one for every value of partition_field_name
(with a cardinality of N). The scoring for anomalies in a partition (since version 6.5) is very independent of anomalies in other partitions.
So, utilize partition_field_name
for logical splits that should be more independent from each other.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.