ML: difference between partition_field_name and by_field_name?

I do not really understand the difference between these two settings. They seem to perform the same function from my perspective.

This other discuss thread may shed some light: ML Kibana: difference between by_field_name and partition_field_name - #4 by richcollier

1 Like

Alright thanks for the link. As I understand it, the partition_field_name is going to be a harder split in the model, then? So if I want the anomaly scores to be solely based on data matching the split field, I should use partition_field_name. And I should only use by_field_name if I want a softer split that is going to let data from the whole population affect anomaly scores.

Yes, that's pretty much it. Think of using partition_field_name as practically the equivalent of N number of single metric jobs, one for every value of partition_field_name (with a cardinality of N). The scoring for anomalies in a partition (since version 6.5) is very independent of anomalies in other partitions.

So, utilize partition_field_name for logical splits that should be more independent from each other.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.